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Abstract 


Free words are elements of a free monoid, generated over an alphabet via the binary 
operation of concatenation. Casually speaking, a free word is a hnite string of letters. 
Henceforth, we simply refer to them as words. Motivated by recent advances in 
the combinatorial limit theory of graphs-notably those involving flag algebras, graph 
homomorphisms, and graphons-we investigate the extremal and asymptotic theory 
of pattern containment and avoidance in words. 

Word H is a factor of word W provided V occurs as consecutive letters within W. 
W is an instance of V provided there exists a nonerasing monoid homomorphsism 0 
with </)(V) = W. For example, using the homomorphism 0 dehned by 0(T) = Ror, 
0(/i) = a, and 0(F*) = baugh, we see that Rorabaugh is an instance of PhD. 

W avoids V if no factor of W is an instance of V. V is unavoidable provided, over 
any hnite alphabet, there are only hnitely many words that avoid V. Unavoidable 
words were classihed by Bean, Ehrenfeucht, and McNulty (1979) and Zimin (1982). 
We briehy address the following Ramsey-theoretic question: For unavoidable word V 
and a hxed alphabet, what is the longest a word can be that avoids V? 

The density of U in IT is the proportion of nonempty substrings of IT that are 
instances of V. Since there are 45 substrings in Rorabaugh and 28 of them are 
instances of PhD, the density of PhD in Rorabaugh is 28/45. We establish a number 
of asymptotic results for word densities, including the expected density of a word in 
arbitrarily long, random words and the minimum density of an unavoidable word over 
arbitrarily long words. 

This is joint work with Joshua Cooper. 
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Chapter 1 


Background and Introduction 

1.1 Discrete Structures and Combinatorics 

Any mathematical structure that is enumerable or noncontinuous can be referred to 
as discrete. Discrete mathematicians, therefore, usually study such things as sets, 
integers, groups, graphs, logical statements, or geometric objects. However, even 
uncountable or continuous objects such as topological spaces, contours, differential 
equations, or dynamical systems can be discretized or otherwise studied by their 
discrete properties. 

Perhaps the structure most commonly identihed with discrete mathematics is a 
graph. A graph G consists of a set V(G) of points, called vertices or nodes, and a set 
B(G) of unordered pairs of points, called edges. It is often represented visually, with 
points or circles as vertices, and line segments that connect the points as edges. 

Though the term “discrete mathematics” can technically encompass any study 
of discrete objects, including much of algebra, number theory, logic, and theoretical 
computer science, it is more commonly used as a synonym for combinatorics. 

Combinatorialists are, generally speaking, interested in counting. Of the nature of 
combinatorics, Cameron (1994) says: “Its tentacles stretch into virtually all corners 
of mathematics.” Though some mathematical structures are inherently more discrete, 
and thus more susceptible to combinatorial analysis, any structure can be the subject 
of combinatorial investigation. Two particular combinatorial perspectives, Ramsey 
theory and extremal theory, are especially important for the present work. 
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1.1.1 Ramsey Theory 


Ramsey (1929) proved that, for any fixed r^n^fi G every sufficiently large set 
r with its r-subsets partitioned into p classes is guaranteed to have an n-element 
subset A„ C r such that all the r-subsets of are in the same class. This was 
the advent of a major branch of combinatorics known as Ramsey theory. If a given 
property holds for every sufficiently “large” structure within a class of structures, 
then a combinatorialist might investigate how large a structure must be to guarantee 
the property. 

1.1.2 Extremal Theory 

In combinatorial optimization, we look at structures subject to given constraints 
and ask: “What are the optimal values obtained by such-and-such function within 
these constraints?” or “Which structures satisfy the constraints and optimize the 
function?” That is, we might try to find extremal values and a characterization of 
the structures which obtain the extremal values. A foundational example of this 
school of thought comes from Turan (1941), who classified graphs on n vertices with 
the highest possible number of edges but with no set of fc -|- 1 vertices for which all 
possible edges are present. 

1.2 Words 

Our present interest is in words-not the linguistic units with lexical value, but rather 
strings of symbols or letters. We are interested in words as abstract discrete struc¬ 
tures. There are many different ways discrete mathematicians view words: as se¬ 
quences, permutations, elements of a monoid, etc. Within each perspective there is a 
distinct set of axioms for how words are built and how they interact. Consequently, 
the theory and applications that arise for each perspective are drastically different. 
One ubiquitous approach for studying discrete structures is to consider the substruc- 
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tures. In the case of sequences or permutations, the “subword” generally consists of 
a subsequence of not-necessarily consecutive terms. 

Some number theorists and combinatorialists study sequences of numbers for 
example: 1,1,2,3,5,8,13,21,34,... . A numeric list might be generated by a re¬ 
cursive formula /(I) = /(2) = l,/(n + 2) = f{n -|- 1) -|- /(n) , an explicit formula 
f{n) = ~ ~ enumeration of a particular class of 

structures f{n) is the number of way to tile a 2 x (n — 1) rectangle with 2x1 
dominoes . See the Online Encyclopedia of Integer Sequences (OEIS Foundation Inc. 
2011) for many such sequences including oeis.org/A000045, the Fibonacci sequence 
There are natural questions one might ask about such a sequence: Is it periodic? Is 
it bounded? Does it converge? What is the asymptotic rate of growth? 

The elements of a sequence need not be numbers to be of mathematical interest. 
In a sequence of colors, for example, one can identify the frequency with which yellow 
appears, or the probability that red is followed by blue, or whether there exists a 
subsequence of k black entries that are equally spaced in the original sequence. One 
seminal result on nonnumeric sequences was by van der Waerden (1927), who showed 
that, for any positive integers k and r, every sufficiently long sequence containing at 
most r distinct colors contains a monochromatic fc-term arithmetic progression (i.e., 
a length-/c subsequence of a single color and equally spaced terms). 

A large body of work exists for permutations, which are sequences of elements of 
a linearly ordered set (generally with no element occurring twice). The substructures 
for permutations are subsequences, which are usually only identified in terms of their 
permutation pattern a. For example, the permutation 1342 encounters the pattern 
cr = 1 (via subsequences 1, 3, 4, and 2), a = 12 (via 13, 14, 12, and 34), a = 21 (32 
and 42), a = 123 (134), a = 132 (132 and 142), a = 231 (342), and a = 1342 (1342). 
Perhaps the first work on permutation patterns was that of MacMahon (1915), who 
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showed that 132-avoiding permutations are enumerated by the Catalan numbers (see 
oeis. 0 rg/AOOOlO 8 ). For more on permutation patterns, see Kitaev (2011). 

For our present study of words, we consider only “subwords” that consist of con¬ 
secutive letters. This is the perspective that holds for elements of a free monoid. A 
monoid is an algebraic structure consisting of a set, an associative binary operation 
on the set, and an identity element. A free monoid is dehned over some generating 
set of elements, which we view as an alphabet of letters. Its binary operation is 
simply concatenation, its elements-called free words-are all hnite strings of letters, 
and its identity element is the empty word (generally denoted with £ or A). Often, 
the operation of a monoid is called multiplication, so it is htting that a “subword” 
of a free word is called a “factor.” For example, in the free monoid over alphabet 
{a, b, c, d, r}, the word cadabra is a factor of abracadabra because abracadabra is the 
product of abra and cadabra. 

If there is an inverse element for every element s in the generating set, we are 
dealing with a free group. Then any word with or as a factor is equivalent to 
the word obtained by removal of said factor. For example, tee~^hee~^e is equivalent 
to reduced word the. Within what came to be known as combinatorial group theory, 
Dehn (1911) hrst proposed the Word Problem for Groups: Given two words formed 
from the set of generators of a group, determine whether the words represent the 
same group element? 

1.3 Combinatorial Limit Theory 

In an era of massive technological and computational advances, we have large sys¬ 
tems for transportation, communication, education, and commerce (to name a few 
examples). We also possess massive quantities of information in every part of life. 
Therefore, in many applications of discrete mathematics, the useful theory is that 
which is relevant to arbitrarily large discrete structures. For example, graphs can be 
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used to model a computer network, with each vertex representing a device and each 
edge a data connection between devices. The most well-known computer network, 
the Internet, consists of billions of devices with constantly changing connections; one 
cannot simply create a database of all billion-vertex graphs and their properties. 

We use the term “combinatorial limit theory” in general reference to combina¬ 
torial methods which help answer the following question: What happens to discrete 
structures as they grow large? Many classical questions from combinatorics fall nat¬ 
urally into this held of study. One incredibly productive approach to handling large 
discrete structures is the probabilistic method, the origin of which is generally cred¬ 
ited to Paul Erdos. See Alon and Spencer (2008) for standard probabilistic tools 
used in combinatorics. Many asymptotic results from such methods, which may be 
wildly inaccurate for small values, become increasingly more accurate as the relevant 
structures grow. 

In the combinatorial limit theory of graphs, major recent developments include the 
hag algebras of Razborov (2007) and the graph limits of Borgs, Chayes, Freedman, 
Lovasz, Schrijver, Sos, Szegedy, Vesztergombi, etc. (see Lovasz 2012). Given the 
fundamental reliance of these methods on graph homomorphisms and graph densities, 
we strive to apply the same ideas to words. We discuss graph limits in more detail 
when describing future research directions in Section 6.2. 

1.4 Combinatorics of Free Words 

We are henceforth focused on free words, which we will simply call words. For a 
summary of notation used throughout this text, see Appendix E. 

Definition 1.1. For a fixed set S, ealled an alphabet, denote with S* the set of all 
finite words formed by concatenation of elements o/F, called letters. Words in S* are 
called F-words. The set of length-n T-words is denoted with F"'. The empty word, 
e, consisting of zero letters, is a T-word for any alphabet F. 
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The set S*, together with the associative binary operation of concatenation and 
the identity element e, forms a free monoid. We denote concatenation with juxtapo¬ 
sition. Generally we use natural numbers or minuscule Roman letters as letters and 
majuscule Roman letters (especially T, U, V, W, X, Y, and Z) to name words. Majus¬ 
cule Greek letters (especially T and S) name alphabets, though for a standard g-letter 
alphabet, we frequently use the set [q] = {1,2,..., q}. 

Example 1.2. Alphabet [3] consists of letters 1, 2, and 3. The set of [3]-words is 


(1, 2,3}* = (e, 1, 2, 3,11,12,13, 21, 22, 23, 31, 32,33, 111, 112,113,121,...}. 


Definition 1.3. A word W is formed from the concatenation of finitely many letters. 
If letter x is one of the letters concatenated to form W, we say x occurs in W, or 
X E W. For natural number n E N, an n-fold concatenation of word W is denoted 
W"-. The length of word W, denoted |1R|, is the number of letters in W, counting 
multiplicity. T{W), the alphabet generated by W, is the set of all letters that occur 
in W. For q eN, word W is g-ary provided |L(1R)| < g. We use ||1R|| to denote the 
number of letter recurrences in W, so ||1R|| = |1R| — \L(W)\. 

Example 1.4. Let W = bananas. Then a,b E W, but c ^ W. Also \W\ = 7, 
L{W) = {a,b,n,s}, and ||W|| = 3. 

For the empty word, we have |e| = 0, L{e) = 0, and ||e|| = 0. 

Definition 1.5. Word W has (''^ 2 ''"^) (nonempty) substrings, each defined by an 
integer pair {i,j) with 0 < i < j < \W\. Denote with W[i,j] the word in the {i,j)- 
substring, consisting of j — i consecutive letters ofW, beginning with the {i + l)-th. 

V is a factor ofW, denoted V < W, provided V = W[i,j] for some integers i and 
j with 0 < i < J < |1R|; equivalently, W = SVT for some (possibly empty) words S 
and T. 


Example 1.6. nana < nana < bananas, with nana = nana [0,4] = bananas[2,6]. 
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1.5 Word Avoidability 


Definition 1.7. For alphabets T and S, every (monoid) homomorphism 0 : F* —)■ E* 
is uniquely defined by a function f : T —)■ S*. We call a homomorphism nonerasing 
provided it is defined by f : T —)■ E* \ {e}; that is, no letter maps to e. 


Example 1.8. Consider the homomorphism f : {b,n, s,u}* —>■ {m,n,o,p,r,v}* de¬ 
fined by Table 1.1. Then (j){sun) = moon and (j){hus) = vroom. 


Table 1.1 Example 
nonerasing fnnction. 


X 

b 

n 

s 

u 

(j){x) 

vr 

n 

m 

00 


Definition 1.9. U is an instance of V, or a E-instance, provided U = 4>{V) for some 
nonerasing homomorphism f; equivalently, 

• V = xqXi ■ ■ ■ Xm-i where each Xi is a letter; 

• U = AqAi ■ ■ ■ Am-i with each word Ai ^ e and Aj = Aj whenever Xi = Xj. 

W enconnters V, denoted V ^ W, provided U <W for some V -instance U. If W 
fails to encounter V, we say W avoids V . 

To help distingnish the enconntered word and the enconntering word, “pattern” 
is elsewhere nsed to refer to V in the enconnter relation V ^ W. Also, an instance 
of a word is sometimes called a “snbstitntion instance” and “witness” is sometimes 
nsed in place of enconnter. 

1.5.1 r-TH Power-Free Words 

The earliest resnlts in avoidability involved avoiding words of the form x”. When 
specifically discnssing x”-avoidance, the term r-th power-free is generally nsed (or 
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square-free for r = 2 and cube-free for r = 3). We see in Figure 1.1 that only finitely 
many square-free words exist over a given two-letter alphabet. However, Thue (1906) 
demonstrated the existence of arbitrarily long (even inhnite), ternary, square-free 
words. 




a 


b 



e 


Figure 1.1 Binary words 
that avoid xx. 


In the 1970s, a number of important results were proved regarding square-free 
words. For example: Jezek (1976) showed that there exists an inhnite set of ternary 
square-free words T such that, for each IF G every word in \ {W} avoids 
IF; Li (1976) characterized all maximal square-free words. Within their seminal 
work on avoidability-the central result which we discuss later-Bean, Ehrenfeucht, and 
McNulty (1979) dehned two interesting homomorphisms that preserved the property 
of being r-th power-free. In particular, h : N —)■ [3] that preserves it for r > 2 and 
(7 : N —)■ [2] for r > 3. 

1.5.2 k-Avoidability 

Definition 1.10. A word V is fc-avoidable provided, over a fixed alphabet of size k, 
there are infinitely many words that avoid V. Inversely, V is fc-unavoidable provided 
every sufficiently long word with at most k distinct letters encounters V. 


We saw in Section 1.5.1 that the word xx is 3-avoidable but 2-unavoidable. A 


word is doubled provided every letter in the word occurs at least twice. Every doubled 
word is fc-avoidable for some k > 1 (see Lothaire 2002). 

Theorem 1.11 (Blanchet-Sadri and Woodhouse 2013, Theorem 2). “Letp be a [word] 
of m distinct [letters]. 

1- If \p\ > 3(2”^“^), then p is 2-avoidable. 

2. If \p\ > 2™, then p is 3-avoidable.” 

There remain a number of open problems regarding which words are fc-avoidable 
for particular k. See Lothaire (2002) and Currie (2005) for surveys on avoidability 
results. 

1.5.3 General Avoidability 

Definition 1.12. A word V is unavoidable provided, for any finite alphabet, there 
are only finitely many words that avoid V; equivalently, V is k-unavoidable for all 
k>2. 

The hrst classihcation of unavoidable words (Theorem 1.14) was by Bean, Ehren- 
feucht, and McNulty (1979), using the following dehnitions. 

Definition 1.13. “Let W be a word. The letter x is free for W provided x occurs in 
W and for no n ^ u is it possible to find letters Cq, • • • , e„ and fo, - ■ ■ , fn such that 
all of the following are [factors] of W: 

fo^o fo^l fl^l ■ ■ ■ fn^n fn^." 

“If X is free for W, then is the word obtained from W by deleting all occur¬ 
rences of X.” 
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“U is obtained from W by identification of letters whenever’’ for some letters “x 
and y [...] occurring in W, U is the word obtained from W by substituting x for y.” 

“W reduces to U provided there are words Vb, Vi, • • • , 14-1 with W = Vo, U = Vn-i 
and [either] V^+i = Vf for some letter x free in Vi or V^+i is obtained from Vi by 
identification of letters, for all i with 0 < i < n — 1 .” 

Theorem 1.14 (Bean, Ehrenfeucht, and McNulty 1979, Theorem 3.22). “The word 
W is unavoidable if and only ifW reduces to a word of length one.” 

Three years later, Zimin published a fundamentally different classification of un¬ 
avoidable words (Zimin 1982 in Russian, Zimin 1984 in English). 

Definition 1.15. Define the n-th Zimin word recursively by Zq := e and, for n eN, 
Zn+i = ZnXnZn- Using the English alphabet rather than indexed letters: 

Z\ = a, Z 2 = aba, Z^ = abacaba, Z 4 = abacabadabacaba, .... 

Equivalently, can be defined over the natural numbers as the word of length 
2*^ — 1 such that the Tth letter, 1 < i < 2”, is the 2-adic order of i. 

Theorem 1.16 (Zimin 1984). A word V with n distinct letters is unavoidable if and 
only if Zn encounters V. 

instances are precisely sesguipowers of order n. From Berstel et al. (2008), 
“any nonempty word is a sesquipower of order 1; a word w over an alphabet 4 is a 
sesquipower of order n > 1 ii w = wqvwq for some words wo,v G A* with v ^ e and 
Wo a sesquipower of order n — 1.” 

1.5.4 A Ramsey-Type Question 

With Zimin’s concise characterization of unavoidable words, a natural combinatorial 
question follows: How long must a g-ary word be to guarantee that it encounters a 
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given unavoidable word? By Definition 2.1, i{n,q) is the smallest integer M such 
that every g-ary word of length M encounters 

In 2014, three papers by different authors appeared, each independently proving 
bounds for f(?7,,g). Cooper and Rorabaugh (2014) showed that (Theorems 2.2, 2.9) 

^2(-l)(l+o(l)) < < n-1^2g + 1), 

where denotes an exponential tower with b copies of a. These results were presented 
at the 45th Southeast International Conference on Combinatorics, Graph Theory, and 
Computing in March 2014. 

In June, Tao (2014+) introduced a more general function L{q, V) for what he 
calls the “Ramsey number” of any unavoidable word V. He also attained similar 
lower and upper bounds for L{q, Z^) = f(n,, q). Tao’s lower bound, which we restate 
as Theorem 2.10, is even more general, applying to any unavoidable word. 

In September, Rytter and Shur (2014+) also introduced the function f(n, g), to¬ 
gether with the concept of “minimal words of Zimin type n”; that is, instances of 
Zn which contain no Z„-instance as a proper factor. We call such words minimal 
Zn-instances. Using minimal instances, and some computation, Rytter and Shur es¬ 
tablish the best known upper bounds for f(3,g) and f(4, 2). We restate their results 
in Section 2.3 for further use. 

A factor-avoidance variant of this function has been considered at least as early 
as the German work of Evdokimov (1983), some results of which were made more 
readily available in English by Burstein and Kitaev (2006). For some fixed alphabet 
A, a set of words S is called unavoidable provided there are only finitely many words 
in A* that do not contain any word in S' as a factor. Note that if the alphabet has at 
least 2 letters, every nonempty word by itself is avoidable. In Kitaev’s work, L^in) 
is the maximum length of a word in A* that avoids some unavoidable set S C M”. 
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Theorem 1.17 (Evdokimov 1983, Theorem 1; Burstein and Kitaev 2006, Theo¬ 
rem 2.3). 

L^{n) = + n-2. 


1.6 Word Densities 

Given nonempty words V and W, the (instance) density ofV in W, denoted 6 {V, IT), 
is the proportion of snbstrings of W that contain instances of V. For example, two 
of the snbstrings of banana contain a:a:-instances: anan and nana. Therefore, 

6{xx,bo^nana) = 2/Q)- 

Recall that a word V is donbled provided every letter in V occurs at least twice. 
For a doubled word V with k >2 distinct letters and an alphabet E with |E| = g > 4, 
{k,q) 7^ (2,4), Bell and Goh (2007) showed that there are at least X{k,q)"‘ words in 
S"" that avoid V, where they dehned the function 7 to be 

This exponential lower bound on the number of words avoiding a doubled word hints 
at the moral of Ghapter 4: instances of doubled words are rare. For doubled word V 
and an alphabet E with q > 2 letters, the probability that a random word Wn G E"^ 
encounters V is asymptotically 1. Indeed, the event that Wn[b\V\, {b + 1)|T|] is an 
instance of V has nonzero probability and is independent for distinct 6 G N. Never¬ 
theless, the expected density 6n{V, q) = E((5(l/, Wn)) (Dehnition 4.1) is asymptotically 
negligible. Specihcally, the central result of Ghapter 4 is the following dichotomy. 

Theorem (4.4). Let V be a word on any alphabet. Fix integer q > 2. V is doubled 
if and only if 6(V, q) = lim„^oo ^n(T, q) = 0. 

For doubled V, not only does 6 (y, q) = 0, but we establish tight concentration of 
6 (y, Wn) for random word Wn G [g]". 
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Theorem (4.19, 4.20). Let V be a doubled word, q > 2, and Wn E [g]"" chosen 
uniformly at random. 


n n 


Va.i{5{V,Wn)) < 


(logn)" 


n-" 




;(logn)' 


n 


For nondoubled V, we know from the dichotomy that, if 6n(V,q) converges, its 
limit is not 0. To get a handle on the nondoubled case, we consider instances of 
specihed length, a perspective used in the proof of Theorem 2.9. From Dehnition 2.4: 
Let I„(fF, S) be the set of PF-instances in S”, and I„(PF, g) the probability that a 
random length-n g-ary word is a PF-instance; that is. 


In(PF,|S|) 


In(PF,S)| 

ISF 


Example 1.18. U{wow,[2]) = {1111,1121,1211,1221,2112,2122,2212,2222} and 
l4(wow, 2) = ^ = 4, 


Theorem (4.11, 4.12). Fix word V and positive integer q. The limits 6{V,q) and 
I(V} g) = hm„^oo ^n(y, q) both exist, and 6(y, q) = 1(1/, g). 


We also establish bounds for 1(1/, g) under various conditions. 


1.7 Looking Forward 

There are still many unexplored avenues within the combinatorial limit theory of 
free words. The hnal part of this work. Chapter 6, summarizes a few directions for 
further development. There we also pose a number of open questions that arise from 
the present research. 
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Chapter 2 


Bounds on Zimin Word Avoidance 

Recall that V is unavoidable provided, for any finite alphsabet, there are only hnitely 
many words that avoid (i.e., do not encounter) V. Moreover, we stated Zimin’s 
classihcation (Theorem 1.16) that the unavoidable words are precisely the words 
encountered by what are now known as Zimin words (Dehnition 1.15): 

Zi = a, Z 2 = aba, Z^ = abacaba, Z 4 = abacabadabacaba, 

Cooper and Rorabaugh (2014), Tao (2014+) , and Rytter and Shur (2014+) , inde¬ 
pendently began investigating bounds on the length of words that avoid unavoidable 
words. 

2.1 Avoiding the Unavoidable 

From Zimin’s explicit classihcation of unavoidable words, a natural question arises 
in the Ramsey-theoretic paradigm: for a hxed unavoidable word V, how long can a 
word be that avoids U? Our approach to this question is to start with avoiding the 
Zimin words, which gives upper bounds for all unavoidable words. 

Definition 2.1. i{n,q) is the least integer M such that every q-ary word of length 
M encounters Z^- 

a 

Let ^a denote the towering exponential a“ with b occurrences of a. This tetration 
is elsewhere denoted with Knuth’s up-arrow notation by a b- is dehned to be 1. 
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Theorem 2.2 (Cooper and Rorabaugh 2014, Theorem 1.1). Forn,q G Z"*", 

f{n,q)<--\2q + l). 

Proof. We proceed via induction on n. For the base case, set n = 1. Every nonempty 
word is an instance of Zi, so f(l, g) = 1 . 

For the inductive hypothesis, assume the claim is true for some positive n and set 
T = f(n, g). That is, every g-ary word of length T encounters Zn- Concatenate any 
g^ + 1 strings Wo, Wi,..., WqT of length T with an arbitrary letter a* between Wj_i 
and Wi for each positive i < q^\ 

U = Wq Cli Wi 0,2 Wj CI 3 ■ ■ ■ WgT_i OqT WqT. 

By the pigeonhole principle, Wi = Wj for some i < j. That string, being length 
T, encounters Zn- Therefore, we have some word W < Wi that is an instance of 
and shows up twice, disjointly, in U. The extra letter Oj+i guarantee that the two 
occurrences of W are not consecutive. This proves that an arbitrary word of length 
(T + l)(g’^ + 1) — 1 witnesses Zn+i, so 

f(n + 1, g) < (T + l)(g^ + 1) - 1 < (2g + 1)^ = Q^. 

□ 

There is clearly a function Q{n, q) such that f(n + 1, g) < Q(n, and Q(n, g) 

tends to g as n —)■ 00 . No effort has been made to optimize the choice of function, as 
such does not decrease the tetration in the bound. 

The technique used to prove Theorem 2.2 is hrst found in Lothaire’s proof of 
unavoidability of Zn (Lothaire 2002, 3.1.3). Tao (2014+) uses the same technique 
with different approximation to establish a similar upper bound. 

Theorem 2.3 (Tao 2014+, Theorem 6 ). For integer n> 2 and g > 2, 

f(n,g) < 
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The technique used in the original proof by Zimin 1984 implicitly gives, for n >2, 

f(n + 1, g + 1) < (f(n + 1, g) + 2\Zn+i\) f(?7,, \ 

This is an Ackermann-type function for an upper bound, which is much larger than 
the primitive recursive bound from Theorems 2.2 and 2.3. 

Table 2.1 shows known values of f(n, 2). Supporting word-lists and Sage code are 
found in Appendix A. 

Table 2.1 Values of f(n, 2). 


n 

Zn 

f(n,2) 

0 

e 

0 

1 

a 

1 

2 

Sibs, 

5 

3 

SibsiCSibs- 

29 

4 

abacabadabacaba 

> 10483 


2.2 Finding a Lower Bound with the First Moment Method 


Throughout this section, E is a hxed alphabet with |E| = g > 2 letters. 


Definition 2.4. Let I„(hF, S) he the set of W -instances in S"", and In(W,q) the 
probability that a random length-n q-ary word is a W-instance; that is, 


In(hF,|S|) 


In(iV,S)| 

isF 


Lemma 2.5 (Cooper and Rorabaugh 2014, Lemma 2.1). For all n, M E ZF, 


I S)| > g ■ I lM{Zn, S)|. 

Proof. Take arbitrary W G S). By the recursive construction of we can 

write W = WiWoWi with hFi G Iat (^Z(n-i), where 2N < M. Choose the decom¬ 
position of W to minimize |hFi|. Then WiWoXiWi G l;M+i){Zn, E) for each i < q. 

The lemma follows, unless a Z„-instance of length M -\- 1 can be generated in 
two ways - that is, if WiWooWi = ViVobVi for some FiVoFi = V, where |Fi| is also 
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minimized. If |Vi| < then Vi is a prefix and suffix of IVi, so \Wi\ was not 

minimized. But if |Vi| > \Wi\, then Wi is a prefix and suffix of Vi, so |Vi| was not 
minimized. Therefore, |Vi| = \Wi\, so Vi = Wi, which implies a = b and V = W. □ 

Corollary 2.6 (Cooper and Rorabaugh 2014, Corollary 2.2). For all n,M E 

\M+l){Zn,q) > ^MiZn,q). 

Lemma 2.7 (Cooper and Rorabaugh 2014, Lemma 2.3). For all n, M E , 



Proof. The proof proceeds by induction on n. For the base case, set n = 1. Every 
nonempty word is an instance of Zi, so I Im(^i, S)| — ■ 

For the inductive hypothesis, assume the inequality is true for some n G The 
first inequality below comes from the following overcount of Z„+i-instances of length 
M. Every such word can be written as UVU where is a Z„-instance of length 
j < ^. Since an instance of Zn can be no shorter than 2” — 1 < j < ^. For each 
possible j, there are | lj{Zn, E)| ways to choose U and ways to choose V. This 

is an overcount, since a Zimin-instance may have multiple decompositions. 


L{M-1)/2J 


lM(^(n+l),S)| < ^ |I,(Z„,S)|g^-2^' 




(n-l)+l 



□ 
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Corollary 2.8 (Cooper and Rorabaugli 2014, Corollary 2.4). For all n,M G 


^M{Zn, q) < 



n—1 

^(-2"+n+l) 


Theorem 2.9 (Cooper and Rorabangh 2014, Theorem 2.5). As q ^ oo or n ^ cx), 

1 = g2("-l)(l+0(l))_ 


f(n,g) > 


2g2 


\ g(^+l)e(9-i> 


Proof. Let word W consist of M nniform, independent random selections from E. De¬ 
fine the random variable X to connt the nnmber of snbwords of W that are instances 
of Zn (inclnding repetition if a single snbword occnrs mnltiple times in W): 


X 


{{ij) \ 0 <i < j < M,W[i,j] e Iq_p(Z„,S)} 


By monotonicity with respect to word length (Corollary 2.6): 


E(X) = 


< 

< 

< 


H \j-i){Zn,q) 

0<i<j<M 


\0 <i < j < ■lM{Zn,q) 




l(M + l)2e(H)g(-2"+-+l). 


There exists a word of length M that avoids Zn when E{X) <1. It snffices to 
show that: 

(M + 1)2 < 1. (2.1) 

Solving (2.1) for M: 

M < 

^ ( 1 + 0 ( 1 )) _ 


□ 
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Tao (2014+) uses the probabilistic method and generating functions and to prove 


a more general result. 


Theorem 2.10 (Tao 2014+, Corollary 1). Suppose word V has r distinct letters with 
multiplicities 1 = ki = ■ ■ ■ = kg < kg+i < ■ ■ ■ Sikr- If 


n < {1 + o(l)) 
there is a length-n q-ary word that avoids V. 


s + 1)! n 

j=s-\-l 


kj — l 


1 ) 


s + l 


Applying Theorem 2.10 to Zimin words, Tao obtains 


As g —)■ cxo, 


and as n —)■ cxo, 


i{n,q) > (1 + o(l)), 


n— 1 


. 2 11 ( 9 ^-'- 1 ). 

\ ,7-1 


n—l 


2n(9"'-'-i) 

\ j-i 


n—l 


n—l 


, 2 n( 9 "’-‘). 

\ 7-1 


\ j=i V 

2(^g(2"-+-l)))5 

^ ^2"-i(l+o(l))_ 


2.3 Using Minimal Zimin-Instances 

Definition 2.11. For fixed n G Z"*", a Zn-instance is minimal provided it has no 
Zn-instance as a proper factor. 

Let m(n, q) be the number of minimal Zn-instances over a fixed q-letter alphabet. 

The function m(n, q) was hrst introduces by Rytter and Shur (2014+) . They used 
this concept of minimal Zimin-instances to improve the upper bounds of f(3,g) and 
f(4,2). 
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Lemma 2.12 (Rytter and Shur 2014+, Lemma 4.6). The following holds for any 
integers n,q > 2: 

i{n + 1, g) < (f(n, g) + 1) ■ m(n, g) + i{n, q). 

Lemma 2.13 (Rytter and Shnr 2014+, Lemma 4.7). 

1 

m(2,g) = q\-Yl —I”• 

i=i *• 

Theorem 2.14 (Rytter and Shnr 2014+, Theorem 4.4). 

• f(i,?) = 1; 

. f(2,g) = 2g + l; 

. f(3, 2) = 29, f(3, q) = ^/^■ 2^{q + 1)! + 2g + 1; 

. f(4, 2) < 236489. 

Lemma 2.12 follows from the same method nsed in Theorem 2.2. The bonnd on 
f(4, 2) was established using a computer search to hnd m(3, 2) = 7882. 
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Chapter 3 


Word Densities 


Definition 3.1. The factor density of V in W, denoted diV^W), is the proportion 
of length-\V\ substrings of W that are copies ofV; that is 

{{i,j)-. 0 <t<j<\W\,W[t,j] = v} 


d(E, IT) = 


|IT| + 1-|T| 

The (instance) density ofV in W, denoted 6 {V, IT), is the proportion of substrings 
of IT that are instances of V; that is 

{(*, j) : 0 < i < j < |IT|, IT[i, j] is a V-instance'^ 


S{V, IT) = 


The (g-)liniinf density ofV is, 




WG[q\* 

\W\^oo 

The liminf density is defined in terms of alphabet [q] for convenience, but any 
fixed g-letter alphabet would suffice. We need not define a limsup density or liminf 
factor density, as these would always be trivially 1 or 0. A S-limsup factor density of 
V might be of interest for alphabet S 3 L(T), but we do not investigate this here. 
Table 3.1 below gives a numeric summary of the best know bounds for 6 {Zn, q)- 
The value of 6 {Z 2 , q) for g > 2 is from Theorem 3.9. For n = 3, the upper bound 
comes from Section 3.2.1, and the lower bounds are stated in Corollary 3.13. There 
we establish that S^Z^, 2) > but Section 3.3 gives reason to believe that the truth 
is greater than 1/28. Lower bounds for 6 {Z 4 ,q) are found in Theorem 3.10, though 
the best lower bound for g = 2 is in Corollary 3.13. Finally, the best upper bounds 
for 6 {Zn, g) when n > 4 are from Section 4.14. 
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Table 3.1 Best known bounds for the g-liminf density of Z^. 


5(^n,g) 

q = 2 

3 

4 

5 


n = 2 

1/2 = .5 

1/3 ^ .333 

1/4 = .25 

1/5 = .2 


3 

.119 

1.85-10-2 

1.84-10-2 

8.33 - 10-^ 

5.19 - 10-3 

5.31 - 10-3 

2.00 - 10-3 
3.22 - 10-^ 


4 

1.12 ■ 10“^ 
2.40 ■ 10“^ 

8.80 - 10-^ 

6.64 - 10-392943 

3.23 - 10-^ 

9 42 - 10“233250395 

2.58 - 10-3 


5 

3.43 ■ 10“® 

6.13 - 10-^3 

3.01 - 10-^3 

8.46 - 10-^9 









3.1 Density Comparisons 

For graphs F and G, G) is the homomorphism density of F in G: 

\{<P : V{F) ^ V{G) I xy G E{F) ^ <P{x)<P{y) G E{G)}\ 

Kn is the complete graph on n vertices; that is, the graph ^[n], ^ with all possi¬ 

ble edges. In particular, K 2 is often simply called the edge graph, and K 3 the triangle 
graph. For every graph G, we can plot an ordered pair {x,y) = (t{K 2 ,G),t{K 3 ,G)). 
The closure of the set of all such points forms a connected region in [0,1]^ (see 
Section 2.1 of Lovasz 2012), with which we can visualize the relationship between 
edge-densities and triangle-densities in graphs. The tight upper bound for this region 
is y < X 2 , which is a case of the Kruskal-Katona Theorem (Kruskal 1963, Katona 
1968). The lower bound of y > x{2x — 1) is a result of Goodman (1959), but was 
shown to be tight only for x = 1 — ^hy Bollobas (1976). 

We perform a similar comparison for word densities of some fundamental words. 
In Section 3.1.1, we calculate the limit set, as |IF| —)■ cx), of the closure of the 
set of points of the form (d(a^, IF), d(a^, IF)). Then Section 3.1.2 shows all points 
(5(^2, IF), 5(^3, IF)) for all IF of particular, small lengths, presenting them in the 
context of bounds to be proved later. 
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3.1.1 Factor Density of a ^. 

Lemma 3.2. For word W and integers 0 < /c < £, 

d(a^lF) < d(a^lF), 


with equality only when either d(a^, IF) = 1 (that is, W = with m > i) or 

d(a^lF) = 0. 


Proof. Within any ba"^c in IF with a ^ {b, c} and r > i, there are i — k more copies 
of than of F. Hence, unless d(F, IF) = 0, 


.fc T.n ^ (|1F| + 1 - £) d(F, W) + {i-k) ^ , , , 


with equality on the right only when d(F, IF) = 1. 


>d(aMF), 


□ 


Lemma 3.3. For integers t) < k < ^ and rational number dk G 
exit arbitrarily large words W with d(a^, IF) = dk and d(F, IF) = 


[0, 

0 . 


e-k 

e 


n Q, there 


Proof. Let d = ^ for positive integers 1 < m < n. For u,v E fj = d < ^ implies 
v{i -k)-ui >0. Let Wr = ^a^-ii)y^i)r(v(£-k)-ne)+k-i ^ ^ number of 

length-A; substrings in IF^ is 


|lFr| + 1 — A;=(f' — 1 + l)(rM) + {r{v{i — k) — ui) + k — 1) + 1 — k = rv{i — k). 
Now ^ IF, and the number of occurrences of of in IF,, is 


((£ — 1) + 1 — k){ur) = ru{l — k). 


Therefore, d(F, IF^) = 0 and 


d(F, Wr) 


ru{i — k) 
rv{l — k) 



V 


□ 
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Lemma 3.4. For integers < k < I, and as \ W\ —)■ cx) 


i{d{a^W) - 1) < k{d{a^,W) - 1). 


Proof. Let \W\ = M. For given IF, set = d(a^, IF). Also, let count the number 
of maximal factors in IF of the form for k < x < t — 1 and Ak count the number of 
a^-occurrences in the such strings, so < (£ — k)ck. Similarly, set d^ = d(a^, IF) 
and let q count the number of maximal factors in IF of the form for i < x and 
Ai count the number of a^-occurrences in the q such strings. Hence, as M —)■ cx), 

Ci{i — k) A £ Ak 


dh — 


M + l-k 
C£{i — k) F Af Ak 


M + 1 


di — 


A, 


M + 1 - 
A£ 


M + 1’ 

hd + (ic£ + A^ + kck + Ak — 1. 


The desired asymptotic inequality is i{dk — 1) ^ k{d£ — 1), which is equivalent to 
idk — kdi ^ — k. Applying what we said about dk, d£, and M: 


n, j , — k) + Ai + Ak] — klAi] 

cdk — kdi ~ -——;- 

M + 1 

^ — k) + A£ F Ak] — k[A£] 

ic£ + A^ + kck + Ak 

Therefore, it suffices to show one of the following equivalent statements, the last of 
which we already established. 


^[c£{i — k) F A£ F Ak — k[A(\ 
^C£ + Af + kck F Ak 
+ Af + Ak\ — k[lc£ F A^] 

k(\fc£ + A^ + kck F Ak\ — [^C£ F Aff) 

k{kck + Ak) 

kck F Ak 


< i — k] 

< {a — k)[ic£ F A£ F kck F A^]; 

< + Af + kck + Afc] — [£c£ + A^ + Afc]); 

< ^{kck)-, 

< iCk] 
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Ak < {i- k)ck. 


□ 

Lemma 3.5. Let 0 < k < £ be integers and {dk,d£) G be found on the triangle 
defined by the following inegualities: 

• 0 < di < dk,' 


• k{de - 1 ) > £{dk - 1 ). 

Then for all e > 0, there exist arbitrarily long words W sueh that 

d(a^, W) — dk < e and d(a^, W) — dt < e. 

Proof. Since k{d£ — 1) = £{dk — 1) and di = 0 intersect when dk = We can break 
the triangle into two cases: 

(I) 0 < d, < 4 < ^. 

(II) 0 < 4 < 4, ^ < 4, k{de - 1) > i{dk - 1). 

Without loss of generality, let dk = ^ and 4 = ^ for some integers U£,Uk,v G Z 
satisfying 0 < U£ < Uk < v ^ 0. For r G Z"*", dehne length nr-word Wr to be 

14 = 
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which is equivalent to both of the following: 

{v-ui){£-k) >i{uk-ue)-, 

t V — Ui 

Case (I): Since Uk < v, 

Uk-ui Uk , i -k 
- < — = dk< 

V — Ui V i 

Case (II): Since k{di — 1) > i{dk — 1), 

k ^ 1-4 
£ - 1 - 4 ’ 

which implies 

£ - k _ ^ ^ ^ ^ _ 1-4 _ dk - dj _ Uk - ue 

i I ~ 1 — 4 1 — 4 V — ui 

□ 

Theorem 3.6. For integers 0 < fc < £ and ordered pair {x,y) G [0,1]^? there exist 
arbitrarily long words W with d(a^, IT) ~ x and d(4, IT) ^ y if and only if {x,y) 
falls in the triangular region shown in Figure 3.1, defined as follows: 

• 0 < y < x; and 

• k{y — 1) > i{x — 1). 

Proof. The upper and lower bounds are established in Lemmas 3.2 and 3.4, respec¬ 
tively. The density of points in this triangle is established in Lemma 3.5. □ 

3.1.2 Instance Density oe Zimin Words 

The same sort of comparison as we see in Theorem 3.6 can also be made for instance 
densities. Figure 3.2 shows the relationship between the instance densities of Z 2 and 
Z 3 in binary words of length 28. See Appendix B for plots corresponding to binary 
words of lengths 13, 16, 19, 22, 25, and 28 and the code used to generate the points. 
The graphs also give a preview of some asymptotic results that we will establish later. 
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Figure 3.1 Relation between d(a^, W), d(a^, W) 
for 0 < /c < £ as |1F| —)■ oo. 


3.2 Minimum Density of Zimin Words 


Recall that 6{Z, hF), the (instance) density of word Z, is the proportion of substring of 
W that are Z-instances. Thus, 6{Z, W) can always be written as a rational number 
with denominator the number of substrings of W. Let us begin with the 

following trivial facts. 


Fact 3.7. 6{Zi, W) = 1 for every nonempty word W ^ e. 

Fact 3.8. For any q G Z+, ifV has no reeurring letter, 6(y,q) = 1. 


Proof. The density of V is bounded above by 1. As |hF| grows, the proportion of 
substrings of length at least \V\ goes to 1: 


|W| 

E (iw-l + i-() 


l=\V\ 


r\j 




Since no letter occurs twice in V, every word of length at least \V\ is a R-instance. □ 


The remainder of this chapter is primarily devoted to hnding 6{Zn, q), the liminf 
density of Zimin words. 
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0 .: 


0 .( 



S{Z2,W) 


0.2 


0.4 


0.6 


0.8 


1 


Figure 3.2 All {x, y) with x = <5(^2, W),y = W) for W e [2]28. 
Assuming binary W: 

The line ?/ = x is an absolute upper bound. 

The vertical blue line is 6 (Z 2 , 2 ) = 

The horizonal blue line is a lower bound on 6 (Z 3 , 2). 

The point at ~ (0.7322,0.1194) shows expected densities in large random W. 

Theorem 3.9. 


1 


5(^2, g) 


Proof. Fix alphabet {xq, ... ,Xg_i}. Given word W, let a* be the number of occur¬ 
rences of Xi in W for each i < q. The number of Z 2 -instances of the form XiBxi is at 
least 



where (a —1) is subtracted to avoid counting consecutive occurrences of Xj. Therefore, 
using the Cauchy-Schwarz inequality. 
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1 

2 


^q-l 
, i=0 


^ - 2 lE« 0 +? 


6w{Z2) > 


Vi 

2 ? vS 

|W^|2 3|iy| 

/|iy|2 3|iy| 


Vi 


, i=0 


2 g 


+ q 




rs^ — 


Consider words Wk = Xqx\---x^_^, so |ld^fc| = qk. Every Z 2 -instance in Wk is 
with subword x^ for 3 < £ < k. Therefore 


5{Z2,Wk) = 


f fk 

2^i=0 VV2 


(fc-i)) 




qk‘^/2 

{qky/2 

1 

q' 


□ 


Recall that the function f{n, q) from Chapter 2 gives the least M such that every 
g-ary word of length M encounters Zn- 

Theorem 3.10. 

5(Z„+i,g) > _2n + 2)2gfC.9)+i- 

Proof. On a hxed g-letter alphabet, there are fewer than gh^-^^l+i words of length 
at most f(n, g). In particular, there are fewer than gh’^.'Jl+i Z„-instances of length 
at most f(?7,,g). If given word W is spliced into substrings of length f(?7,,g), each 
substring is guaranteed to contain a Z„-instance. In fact, since the shortest images 
of Zn are length 2"’ — 1, we can allow the substrings to overlap by 2” — 2 letters and 
still avoid counting the same encounter of Zn twice. Picking one Z„-instance from 
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each substring, we form a set of [|hh|/f(u, g)J nonoverlapping ^^-occurrences in W. 
Enumerate the Z„-instances of length at most f(n,g) by Vq, Vi,..., 14_i for some 
k < Let tti be the number of occurrences of Vi in the set for each i < k. 


Then 


Therefore, 


k-l 


Ha* = 

i=0 


f(n,g) - {2^ 


2 ) 


rr) 


S{Zn+l,W) 


5{Zn+l,W) 




|TV| |2 



Lf(n,g)-(2"-2)J 



2 k 

|W| |2 

1 

> _ 

f(n,i})-(2"--2)J 

rsj 

2 k 

("T') 


1 

(f(n, g) — 2" -I- 2yk 
1 

(f(n, g) -2^ + 2 ) 2 g%>h+i‘ 


□ 


We call a Z„-instance minimal provided it has no proper factor that is also a 
Z„-instance (a concept introduced by Rytter and Shur 2014-I-). Recall that m(n.g) is 
the number of minimal instances over a hxed g-letter alphabet. Any time a string 
encounters it must contain a minimal Zn,-instance. Therefore, we can replace 
qi{n,q)+i Theorem 3.10 with m(n, g). 

Corollary 3.11. 

^(■^n+i,g)_ qr) _ 2^1 + 2)2 m(? 7 ,, g) ’ 
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Lemma 3.12 (Corollary of Lemma 2.13). 


m(2, q) < g!2'^. 

Recall i{2,q) = 2q + 1, m(2, 2) = 6, f(3,2) = 29 (Table 2.1), and m(3,2) = 7882 
(Rytter and Shur 2014+) . 

Corollary 3.13. 


^(^ 3 , 2 ) >-; 


aZ3,q)> 

5 (^ 4 , 2 ) > 


(2g- l)2g!2C 
1 


4169578 

We have strong evidence in Section 3.3 that S^Z^, 2) > 


3.2.1 Limits of Probabilities 

We denote with Im{V, q) the probability that a random g-ary word of length M is a R- 
instance. We prove in Chapter 4 that the limit probability 1(1/, q) = limj^f^oo iM(R,g) 
always exists. Conseqnently, 

h(l/,g)<I(l/,g). 

In Chapter 5, we provide npper bonnds for I(Z„, q) and a method to explicitly 
calcnlate I(.^ 2 , q) and I(.^ 3 , g), thns establishing various upper bounds for h(Z„, g). 


3.3 The de Bruijn Graph 

Definition 3.14. For a fixed alphabet S and positive integer k, the fc-dimensional 
de Bruijn graph is a directed graph with vertex set and an edge from U to W 
whenever U = aV and W = Vh for some V G E^“^ and a,b E T,. 
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Evdokimov (1983) construed words as walks on a de Bruijn graphs to prove bounds 
for permutation pattern avoidance, and his work is delivered to us from German into 
English by Burstein and Kitaev (2006). We now demonstrate how this perspective 
can be utilized to hnd minimum word densities. 


Definition 3.15. A bihx of W is a word that is both a proper initial string and 
terminal string. W is bihx-free provided W has no bifix. W is V-bi£x-free provided 
W has no bifix that is a V-instance. W is a minimal E-instance provided there is no 
proper factor of W that is a V-instance. 

Every Za-instance can be described by its shortest Z 2 -bifix (that is, its Z 2 -bi£x 
that is itself Z 2 -bi£x-free). While building long words you can undercount the number 
of Zs-instances by keeping track of the number of each Z 2 -bi£x-free Z 2 -instance of 
length at most k. 

Lemma 3.16. Fix integers q,n>2. Let Y be a finite set of Z(^n-i)-bifix-free 
instances in [q]*. For E G V, let cy be the count of V-occurrences in W. Then 

Proof. For any given E-occurrence, the next \V\ occurrences might overlap or be 
consecutive, not allowing for a Z„-instance. But that still leaves at least — \ V\cv 
words of the form VUV where |17| > 0. □ 

Since Zimin words are unavoidable, if V contains all the minimal Zimin words, 
then the subtracted |E|cy terms is asymptotically negligible, because 


lim Cy = cx). 

For demonstration, the set of minimal Z 2 -instance in {0,1}*, which are inherently 
Z 2 -bi£x-free, is V = {000, 010,101, 111, 0110,1001}. Let us look at word construction 
as taking a walk on the 4-dimensional de Bruijn graph. Each of the 2^ vertices is 
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a nyble, which is a 4-bit string (half the length of a byte). In Figure 3.3, the solid 
arrow indicates appending a 1 and a dashed line, a 0. 



Figure 3.3 Z 2 -instances on the 4-diniensional de Bruijn graph. 

Left is the 4-diniensional de Bruijn graph; right is a graph indicating the minimal 
Z 2 -instances encountered walking on the de Bruijn graph. 


For a random walk of length M on the de Bruin graph-so the corresponding word 
W has length (M -|- 3)-let Qn{M) be the number of times node n showed up, which 
means J^n=o Qn{,M) = M. We can count the number of occurrences, Rv{M), of each 
minimal Z 2 -instances, V, in W as follows. (To avoid any undercount, assume we do 
not start on a node beginning with a length-3 minimal Z 2 -instance.) 

RoooiM) = QooooiM) + Qiqoo{M); Rm (M) = Qoiii(M) + Qiiii (M); 

.Roio(^) = Qooio(^) + Qioio(^); .Rioi(^) = Qoioi(^) + Qiioi(^); 

.Roiio(^) = Qoiio(^); .Riooi(^) = Qiooi(^)- 
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As M —>■ cx), the density of Za-subwords is asymptotically at least 

) Zv^vRUM) 

— rsj - 

M2 

One can assign probabilities to the outgoing edges of each nyble. Dehne prob¬ 
ability tuple p = {pn : n G {0,..., 15}) G [0,1]^® with Pn being the probability that 
node n is followed by a 1. Given an long random walk with hxed probabilities 
p, dehne q = (g„ : n G {0,..., 15}) G [0,1]^® where Qn is the proportion of node-n 
encounters in the walk. This leads to the following system of 17 equations with 
k G {0,1, 2, 3,4, 5,6, 7}: 

q2k = gfc(i - Pk) + <?fc+8(i - Pk+s); 

<?2fc+l = QkPk + Qk+sPk+S] 

15 

1 = 

i=0 

Further, dehne ry as RyiW) above, substituting Qn for Qn{M). 

^000 = <?o + <?8; ’"oio = <?2 + Qio] ’"oiio = Qq] 

^111 = Qi R qi^'i rioi = q^ + qiz'-, ’"looi = 5'9- 

Then the expected Zs-density is asymptotically at least d = By solving 

the above system of 17 equations for the Qn in terms of the p„, rewrite d in terms of 
the probabilities. Minimizing d over the 16-dimensional unit cube-each probability is 
in [0, l]-should give a lower bound for ^(Zs, 2). We need only to show that for every 

limit density of 6 {Z 3 , 2 ), or at least for the liminf-density, there exists an associated 

set of probabilities for the de Bruijn graph. 

Using the function sage.numerical.optimize.minimize constrained() in Sage (Stein 
et ah 2014), one can obtain probabilities producing a lower bound for Zs-density that 
is slightly larger than 1/28. From these approximate results, we have identihed the 
following distinct probability edge-assignments which each give a density of exactly 
1/28. For two of these, we also have associated families of words which exhibit the 


Svev ( 


(Rv{M) 


(7) 
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given probabilities as n grows. denotes that a node does not appear a positive 
proportion of the time, so its probability is irrelevant). 


pW = (-,4/5, 0,3/5,2/5,-,1/5,0,1,4/5,-, 3/5, 2/5,1,1/5,-); 
p(2) = (-,1,0, 3/4,1,-,1/2,0,1,1/2,-,0,1/4,1,0,-), 

Whs) = (0001110010011100011011000111)’"; 

p(3) = (-,!,-, 3/5, 2/5,-,1/5,0,1,1,0,-,2/5,0,1/5,-), 

= (( 11010001 )^( 101001 ) 2 ( 110001 )^ 2 ( 1001 )®)’". 

Conjecture 3.17. 5(^3, 2) > 

The conjecture is with a strict inequality, as we can presumably increase the lower 
bound by using a larger set of Z 2 -instances. For example, the set of all Z 2 -bi£x-free 
Z 2 -instances of length at most 5 is 

{ 000 , 010 , 101 , 111 , 0110 , 1001 , 01001 , 01101 , 10010 , 10110 }. 

We would then view words as walks on the 5-dimensional de Bruijn graph and mini¬ 
mize the associated expression in 2® = 32 variables. 
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Chapter 4 


Density Dichotomy in Random Words 


Definition 2.4 is contained within Definition 4.1 below for completeness within this 
chapter. 

Definition 4.1. Fixed n and select Wn G [g]"' uniformly at random. The expected 
density of V is 

6n{V,q) = E{6{V,Wn)). 

The asympototic expected density of V is 


5{V,q) = lim 5niV,q). 


The set of V-instances in S"" is I„(E, S). The probability that a random length-n 
q-ary word is a V-instance is 


The asymptotic instance probability of V is 


Sometimes we will count homomorphisms to attain density upper bounds. 

Definition 4.2. Fix alphabets T and S and assume V ^ IT. An encounter of V, or 
T-encounter, in W is an ordered triple {a^b^cf) where IT[a,6] = 0(T) for nonerasing 
homomorphism 0 : F* —)■ S*. When T = L(T) and W G D*, denote with hom(T, IT) 
the number of T-encounters in IT. (Note that the conditions on F and S are necessary 
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for hom{V, W) to not he trivially 0 oroo.) ForWn G [g]"" chosen uniformly at random, 
the expected number of fd-encounters is 

hom„(V, q) = E(hom(V, Wn)). 

Example 4.3. hom(a 6 , cde) = 4 since cde[0,2] is an instance of ab by one homo¬ 
morphism {a,h}* —>■ {c,d,e}*, C(ie[l,3] is an instances of ab by one homomorphism, 
and C(ie[0,3] is an instance of ab by two homomorphisms. 

In fact, for q G hom 3 (a 6 , q) = 4, since hom(a 6 , W) = 4 for all W G [q]^. 

4.1 The Dichotomy 

Theorem 4.4 (Cooper and Rorabaugh 2015+, Theorem 2.1). Let V be a word on 
any alphabet. Fix integer q >2. The following are equivalent: 

(i) . V is doubled (that is, every letter in V appears at least twice); 

(ii) . 6{V,q) = 0. 

Proof. First we prove (z) {ii). Let Wn G [g]” be chosen uniformly at random. 

Note that in Wn, there are in expectation the same number of encounters of V as 
there are of any anagram of V. Indeed, if V is an anagram of V and 0 is a nonerasing 
homomorphism, then | 0 (R')| = | 0 ( 1 /)|. 

Fact 4.5 (Cooper and Rorabaugh 2015+, Fact 2.2). IfV' is an anagram ofV, then 

hom„(V,g) = hom„(R',g). 

Assume V is doubled and let F = L(R) and k = |F|. Given Fact 4.5, we consider 
an anagram V = XY of V, where |X| = k and F = L(X) = L(y). That is, X 
comprises one copy of each letters in F and all the duplicate letters of V are in V. 

We obtain an upper bound for the average density of V by estimating hom„(l/', g). 
To do so, sum over starting position i and length j of encounters of X in Wn that 


37 


might extend to an encounter of V. There are homomorphisms 0 that map X 

to W[i, i + i] and the probability that Wn[i + J, * + J + |0(^)|] = 0(^) is at most g“h 
Also, the series Yl!f=k converges (try the ratio test) to some c not dependent 

on n. 


Wq) < 


(-y) 


hom„(V^',g) 


< 


< 


^ n-\V\ n-i /,' _j_ 


E E 

i=0 j=k 
n-\V\ 

E - 

i=0 




(-y) 

(T) 

c{n-\V\ + 1) 

(-y) 

0(n '), 


k 1, 


We prove {ii) <= {i) by contraposition. Assume there is a letter x that occurs 
exactly once in V. Write V = TxU where L(V^) \ L(T17) = {x}. We obtain a lower 
bound for 5n{y, q) = E(5(V, Wn)) by only counting encounters with |0(Tf/)| = \TU\. 
Note that each such encounter is unique to its instance, preventing double-counting. 
For this undercount, we sum over encounters with Wn[i,i + j] = 0(a:). 


5n{y,q) 


= 5n{TxU,q) 



1 ” 

-\U\-li-\T\ 

> 

J. 

/n+l\ 

E 

E^ 


1 2 j 

i=\T\ 

i=i 



1 

n-\U\- 

= 

q-\\TU\\_ 

1 

/n+l\ 

E 



1 2 ) 

i=\T\ 



{n-\UT\\ 

= 

q-\\TU\\'_ 

V 2 

/ n+l 

J 

\ 



V 2 , 

) 


q-\\TU\\ 



> 

0. 




-\\TU\\ 


m) 


□ 
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It behooves us now to develop more precise theory for these two classes of words: 
doubled and nondoubled. Lemma 4.7 below both helps develop that theory and gives 
insight into the detrimental effect that letter repetition has on encounter frequency. 


Proposition 4.6 (Cooper and Rorabaugh 2015+, Proposition 2.3). For k G TF, 
f = {ri,... ,rfc} G and d = gcdi^^^{ri), there exists integer N = Nr such that 

for every n > N there exist coefficients Oi, ■ ■ ■ , G such that dn = J2i=i OR* 
tti < N for i > 2. 


Proof. For each j G [ri/d], hnd integer coefficients so that jd is a linear combi¬ 
nation of the rp jd = Yl!i=i h^pri. Let m = 1 -|- min , the minimum taken over 
all i and j. Dehne = bP -f- m > 0 and R = Now for each j, 

k k k 

apVj = 'PP bPri + 'PP mri = jd -|- mR. 


2=1 


2=1 


2=1 


Set N = ri + mR. For n > N, identify jn € ^i/d] such that 


dn = jnd -|- mR (modri). 

Then a* = for i > 1 and oi = ^ {dn — YPi =2 C 

Lemma 4.7 (Cooper and Rorabaugh 2015-I-, Lemma 2.4). For any word V, Let 
F = L(R) = {xi,...,Xk] where Xi has multiplicity r* for each i G [k]. Let U be 
V with all letters of multiplicity r = minjg[q(rj) removed. Finally, let S be any 
finite alphabet with |S| = q > 2 letters. Then for a uniformly randomly chosen V- 
instance W G where d = gcdjg[^](rj), there is asymptotically almost surely a 

homomorphism 0 : F* —)■ S* with = W and |0(t/)| < \/dn. 


Proof. Let a„ be the number of R-instances in S" and be the number of homo- 
morphisms 0 : F* —)• E* such that |0(R)| = n. Let b\ be the number of these 0 such 
that (jifJ) < ^/n and 6^ the number of all other 0 so that bn = bl^ + 6^. Similarly, let 
On be the number of R-instances in E”' for which there exists a 0 counted by bn and 
On the number of instances with no such 0, so Un = + aP Observe that On < b"^. 
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Without loss of generality, assume ri = r (rearrange the Xi if not). We now utilize 
N = Nr from Proposition 4.6. For sufficiently large n, we can undercount by 
counting homomorphisms 0 with |0(a:j)| = a* for the attained from Proposition 4.6. 
Indeed, distinct homomorphisms with the same image-length for every letter in V 
produce distinct Id-instances. Hence 

= cq^ ’’ 0 

where c = depends on Id but not on n. To overcount 6^ (and 

by extension), we consider all ways to partition an n-letter length and so 

determine the lengths of the images of the letters in Id. However, for letters with 
multiplicity strictly greater than r, the sum of the lengths of their images must be at 
least ^/n. 



That is, the proportion of Id-instances of length dn that cannot be expressed with 
\4>{U)\ < y/dn diminishes to 0 as n grows. □ 
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4.2 Density of Nondoubled Words 


In Theorem 4.4, we show that the density of nondonbled V in long random words 
(over a hxed alphabet with at least two letters) does not approach 0. The natnral 
follow-np qnestion is: Does the density converge? To answer this qnestion, we hrst 
prove the following lemma. Fixing V = TxU where a: is a nonrecnrring letter in 
V, the lemma tells us that all but a diminishing proportion of Id-instances can be 
obtained by some 0 with |0(Tf/)| negligible. 

Lemma 4.8 (Cooper and Rorabaugh 2015-t-, Lemma 3.1). Let V = UqXiUiX 2 ■ ■ ■ XrUr 
with r > 1, where U = UqUi - ■ - Ur is doubled with k distinct letters (though any 
particular Uj may be the empty word), the Xi are distinct, and no Xi occurs in U. 
Further, let T be the {k + r)-letter alphabet of V and let S be any finite alphabet with 
q > 2 letters. Then there exists a nondecreasing function g{n) = o{n) such that, 
for a randomly chosen V -instance W G TF, there is asymptotically almost surely a 
homomorphism 0 : T* —)■ S* with 4>{V) = W and \(j){xr)\ > n — g{n). 

Proof. Let W = 0 : 1 X 2 ■ ■ • Xi for 0 < i < r (so Xq = e). For any word IF, let be 
the set of homomorphisms {0 : F* —)■ S* | 0(F) = IF} that map V onto IF. Dehne 
Pj to be the following proposition for i G [r]: 

There exists a nondecreasing function fi{n) = o{n) such that, for a ran¬ 
domly chosen F-instance IF G there is asymptotically almost surely 
a homomorphism 0 G such that |0(f/Xi_i)| < ffin). 

The conclusion of this lemma is an immediate consequence of proposition P^, with 
g{n) = fr{n), which we will prove by induction. Lemma 4.7 provides the base case, 
with r = 1 and fi{n) = y/n. 

Let us prove the inductive step: Pj implies Pj+i for i G [r — 1]. Roughly speak¬ 
ing, this says: If most instances of F can be made with a homomorphism 0 where 
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\(j){UXi_i) \ is negligible, then most instances of V can be made with a homomorphism 
(f) where |0(17Xj)| is negligible. 

Assume P* for some i G [r — 1], and set f{n) = fi{n). Let be the set of 
l/-instances in S"' such that |0(f/Xj_i)| < f{n) for some 0 G Let be the set 
of all other P-instances in P* implies \Bn\ = o(|A„|). 

Case 1: Ui = e, so Xt and Xj+i are consecutive in V. When \(j){UXi_i)\ < f{n), we 
can dehne 'ip so that 'ip{xiXi+i) = and |V’(xj)| = 1; otherwise, let = (j){y) 

for 1/ G r \ {xi,Xi+i}. Then |0(t/Xi)| < f{n) + 1 and Pj+i with fi+i{n) = fi{n) + 1. 

Case 2: Ui ^ e, so \Ui\ > 0. Let g{n) be some nondecreasing function such that 
/(n) = o{g{n)) and g{n) = o{n). (This will be the /j+i for Pj+i.) Let A" consist 
of VL G A„ such that |0(f/Xj)| < g{n) for some 0 G Let = An\ A". The 
objective henceforth is to show that |A^| = o(|A"|). 

For Y G A^, let <Fy be the set of homomorphisms {0 G : |0(t/Xi_i)| < /(n)} 
that disqualify Y from being in B^- Hence H G A„ implies <Fy ^ 0. Since Y ^ A", 
0 G <Fy implies \(j){UXi)\ > g{n), so |0(a;i)| > g{n) — f{n). Pick 0y G <hy as follows: 

• Primarily, minimize |0(17oa:il7iX2 ■ ■ • Ui-iXi)\; 

• Secondarily, minimize |0(17j)|; 

• Tertiarily, minimize \(j){UoXiUiX 2 ■ ■ ■ L^i-i)|. 

Roughly speaking, we have chosen 0y to move the image of Ui as far left as 
possible in Y. But since Y ^ A“, we want it further left! 

To suppress the details we no longer need, let Y = yj0y(x00y(f/i)0y(a;i+i)Y2, 
where 10 = (j)Y{UoXiUiX 2 ■ ■ ■ t/j-i) and 10 = 4 >Y{Ui+iXi +2 ■ ■ ■ Ur). 

Consider a word Z G P"" of the form liZi0y(t/j)Z20y (L0)0y(a;i+i)l2, where Zi is 
an initial string of 0y(a;i) with 2f{n) < \Zi\ < g{n) — 2/(n) and Z 2 is a hnal string 
of (pY^Xi). (See Figure 4.1.) In a sense, the image of Xj was too long, so we replace 
a leftward substring with a copy of the image of Ui. Let Cy be the set of all such Z 
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with jZij a multiple of f(n). For every Z G Cy we can see that Z G A", by dehning 
G as follows: 


^(y) = { 


Z2(j)Y{Ui)(j)Y{Xi+i) 

<t>Y{y) 


\iy = Xi, 
iiy = Xi+Y 
otherwise. 


Fi 


4>Yi 

Ai) 




Fi 


<t>Ym 

Z 2 

<t>Ym 

0v(a^i+i) 



^{Xi) 





Figure 4.1 Replacing a section of (pyi^i) in Y to create Z. 


Claim 1; liminf \Cy\ = oo. 
|y |=n^c>o 


Since we want 2/(n) < \Zi\ < g{n) — 2f\n), and g{n)—2f{n) < |(;/)y(a;i)| —|0y(17i)|, 
there are g{n) — df {n) places to put the copy of (pyiPi). To avoid any double-counting 
that might occur when some Z and Z' have their new copies of (pyiUi) in overlapping 
locations, we further required that f{n) divide jZij. This produces the following lower 
bound: 


1 > 

g{n) - 4/(n) 

f{n) 



—)■ CX). 


Claim 2: For distinct Y, Y' G A^, Cy H Cy = 0. 


To prove Claim 2, take F, Y' G with Z G Cy H Cy/. Dehne Yi, Y 2 , Y(, and Y 2 
as above: 


Yi = (j)Y{UoXiUiX2 ■ ■ ■ Ui-i), Y2 = (j)y{Ui+iXi+2 ' ' ' Ur)] 

YI = (t)Y,{UoXiUiX2 ■ ■ ■ Ui-i), Y! = (t)Y:{Ui+iXi+2 ■ ■ ■ Ur). 

Now for some Zi, Z2, ^2, 

Y,Z,<t>y{U^)Z2MU^)Mx^+l)y2 = Z = YlZ[(Py,{OZ'2(Pym(Py>{x,+,)Yl, 


with the following constraints: 
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(i) \Y^ct>Y{Ui)\ < |0y([/X,)| </(n); 

(ii) \Yict>Y'm\ < \ct>Y'{UX,)\<f\n)- 

(iii) 2/(n) < |^i| < g{n) - 2/(n); 

(iv) 2/(n) <\Z[\< g{n) - 2/(n); 

(v) \Zi(t)Y{Ui)Z 2 \ = \(t)Y{.Xi)\ > gin) - fin)] 

(vi) \Z[(l)Y'iUi)Z' 2 \ = \(f>Y'ixi)\ > gin) - fin). 

As a consequence: 

• |YiZi0y([/i)| < gin) - fin) < by (i), (iii), and (vi); 

• lYiZil > |Zi| > 2/(n) > IF/I, by (iii) and (ii). 

Therefore, the copy of 0y(f/i) added to Z is properly within the noted occurrence 
of Zi(/)Y'i^i)Z 2 in Z', which is in the place of (pY'ixi) in Y'. In particular, the added 
copy of (pYiUi) in Z interferes with neither Y”/ nor the original copy of (pY'iUi). Thus Y( 
is an initial substring of Y and (j)Y'iUi)(j)Y'ixiYi)Y 2 is a hnal substring of Y. Likewise, 
Yi is an initial substring of Y' and 0y(17i)0Y(a;j+i)T2 is a final substring of Y'. By 
the selection process of 0y and 0y/, we know that Yi = Y( and 

(h(i)0y(a^i+i)^2 = 0y'(^i)0v'(^i+i)^2- 

Finally, since /(n) divides Zi and we deduce that Zi = Z[. Otherwise, the 
added copies of 0y(f/i) in Z and of (pY'iUi) in Z' would not overlap, resulting in a 
contradiction to the selection of 0y and (pY'- Therefore, Y = Y', concluding the proof 
of Claim 2. 

Now Cy C for Y e A^. Claims 1 and 2 together imply that \A^\ = o(|A"|). 

□ 
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Observe that the choice of ^/n in Lemma 4.7 was arbitrary. The proof works for 
any function /(n) = o{n) with /(n) —)■ cx). Therefore, where Lemma 4.8 claims the 


existence of some g{n) —)■ oo, the statement is in fact true for all g{n) —>■ oo. 
Let In(V, q) be dehned as 


UV,q)=^ 


\{W E [q]"' I = W for some homomorphism 0 : L(l/)* —)■ [g]*}| 


Note that In(V, q) is equivalently dehned as the probability that a uniformly randomly 
selected length-n word over a hxed g-letter alphabet is an instance of V. Indeed, by 
the nature of the instance relation, only the cardinality of the alphabet matters. 


Definition 4.9. 6suriV,W) (with sur for surjection) is the number of factors of W 
that are instances ofV via a function (f with (jiV) = W, divided by the total possible 
such factors (1). More directly, dsuriY^W) is the characteristic function for the event 
that W is an instance ofV. 


Fact 4.10 (Cooper and Rorabaugh 2015+, Fact 3.2). For any V and q and for 
£ [qT chosen uniformly at random, 

n 

E(<5(R,hF„)) = 5](n + l-m)E(5,„,(R,lF™)) 

m=l 
n 

= '^in + l-m)Im{V,q). 

m=l 

Set I(y,q) = lim^^oo In(R, <?)• When does this limit exist? 

Theorem 4.11 (Cooper and Rorabaugh 2015+, Theorem 3.3). For nondoubled V 
and integer q E TM, t{y,q) exists. Moreover, I{V,q) > q~\M\ > Q. 

Proof If g = 1, then In(V, g) = 1 for tt, > \V\. 

Assume g > 2. Let V = TxU where x is the right-most nonrecurring letter in V. 
Let T = L(R) be the alphabet of letters in V. By Lemma 4.8, there is a nondecreasing 
function g(n) = o(n) such that, for a randomly chosen R-instance W E [g]"", there 


(n + l 

V 2 
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is asymptotically almost surely a homomorphism 0 : F* —)■ [q]* with 4>{V) = W and 
\(l){xr)\ > n- g{n). 

Let ttn be the number of hF G [g]” such that there exists 0 : F* — >• [g]* with 
(j){V) = W and |0(a;r)| > n — g{n). Lemma 4.8 tells us that ^ ~ In(F, q). Note that 
^ is bounded. It suffices to show that a^+i > ga„ for sufficiently large n. Pick n so 
that g{n) < |. 

For length-n Id-instance W counted by On, let (pw be a homomorphism that max¬ 
imizing \(l)w{xr)\ and, of such, minimizes |0iy(F)|- For each and each a G [q], 
let 0^ be the function such that, if (j)w{xr) = AB with 1^41 = [|0iy(a;r)|/2j, then 
— AaB; = ((>w{y) for each y G F\{x} Roughly speaking, we are sticking 

a into the middle of the image of x. 

Suppose we are double-counting, so — 0y(^)- 

\(j)w{xr)\/2 > (n - g{n))/2 > n/3 > g{n) > |0y(T17)| 

and vice-versa, the inserted a (resp., h) of one map does not appear in the image of 
TU under the other map. So (j)w{T) is an initial string and (j)w{U) a hnal string of 
(pyiV), and vice-versa. By the selection criteria of (pw and |0vi/(T)| = |0y(T)| 
and \(j)w{U)\ = |0y(f/)|. Therefore the location of the added a in (j)w(y) and the 
added b in 0^(ld) are the same. Hence, a = b and W = Y. 

Moreover I{V,q) > qYYW > Q. □ 

Having established that 1(1/, q) exists for all V and q, we explore the limit value 
in Chapter 5. 

Corollary 4.12 (Cooper and Rorabaugh 2015-1-, Corollary 3.6). Let V be a non- 
doubled word on any alphabet. Fix an integer q > 0, and let Wn G [g]” be chosen 
uniformly at random. Then 


\imE{6{V,Wn)) =I{V,q). 
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Proof. Let I = I(V^, q) and e > 0. Pick N = sufficiently large so |I — IniV, g)| < § 
when n> N. Applying Fact 4.10 for n > niax(A^, 4A^/e), 


|I-E(h(l/,lF„))| = 


< 


< 

< 

< 


(4') 


(n + 1 — m) 


m=l 


(”h) 


J2in + 1- m)Im{V,q) 


771=1 


n+1 

2 


(A) 

(T) 

W) 

e. 


'^{n + l-m)\l- q)\ 

n + 1 - m)|I - I^(V',g)| 


m=l 
N 


E+ E 

m=l m=7V+l 
Lc?i/4J n g 

y]](n + l —m)l+ y]] (n + 1 —m)- 

m=l m=iV+l ^ 


en /n + 1\ e 
„ - 


□ 

If there are multiple nonrecurring letters in V, then most long Id-instances are 
liable to have numerous homomorphisms. However, if there is exactly one recurring 
letter in V, Theorem 4.14 below provides an upper bound for 1(1/, q) that, as g —)■ oo, 
approaches the lower bound from Theorem 4.11 above. 

Lemma 4.13. Let V be a word with L(l/) = {a;o,a;i, • • • ,Xn}, | L(l/)| = n + 1, where 
Xq occurs ro = 1 time in V and Xk occurs r^ > 1 times in V for each fc G [n]. For 
q,M & IP, and Wm £ [q\^ chosen uniformly at random, 

E(hom(l/, IFm)) = E [M + 1 - 

(io,...,j„>e[M]"-+y V fc=o / 

Proof. For a given W, every encounter of 1/ in IF can be dehned by the starting 
location j of the substring and the lengths (4 = |0(2:fc)|)fc=o letter-images under 

the homomorphism 0. 
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To compute E(hom(V, VTm)) over random selection of Wm G [q\^) we sum over 
all possible j and {ik)k=o probability that, for every k < n, the substrings of 
length ik (which are to be the instances of Xk) are identical. 

Our outside (n + l)-fold summation is over the possible lengths ik, which are 
positive integers with |0(V^)| = This leaves M + 1 — |(/>(ld)| possible 

values for r, the starting location of the instance. 

For each k, only one of the instances of Xk can consists of arbitrary letters and 
then the rest, with their ik{rk — 1) letters, are determined. Thus, the probability of 
an encounters for given r and {ik)k=o 


□ 

Theorem 4.14. Let V be a word with L(l/) = {xo,xi,-- - ,x„}, |L(1/)| = n + 1, 
where Xq occurs once in V and Xk occurs > 1 times in V for each /c G [n]. Then 
for q>2, 

n 1 

k=i y 

Proof. For (ii, ... ,in) G {TPY, let Me = M — J2k>e '^k'f'k for —1 < £ < n, so = M 
and Me_i = Me — ieTe- Then Lemma 4.13 says 

E(hom(V',hFM)) = 5] (M_i + 

Since Mo{Mq + 1) is always nonnegative, 

E{hom{V, Wm)) = Y. {M_i + 1 ) q(-^'^kMWk-i)) 

(io,...,*n>6[M]"+y 

M>Yrk^,ikrk 

* 0=1 

^>ELo 
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< 


+ 

^>Ymz=1 
{ii,...,i„>e(z+)» ^ 

Claim; For 0 < £ < n, 

^Mo(Mo + l)g(-S^-C''(’’'=-')) 

(ii,...,i„>e(z+)" ^ 

(i,+i,...,i„>e(z+)—^ ^ 

where Re{q,x) G M[a;] is a quadratic polynomial with coefficients depending on q and 
[^"] ^ {Ri-i{q,x)) = n 

We already know the claim to be true for £ = 0 with Ro{q,x) = x"^ + x. We 
proceed in proving the full claim by induction on i. Assume the claim holds for 
with i?£_i(g, x) = ax"^ + hx + c. 

OO 1 

(y+i,...,i„>e(z+)"-'! y=i 

OO 1 / n \ 

(i(.+i,...,i„>e(z+)"-^ ii=i 

1 \ ‘^ 

{i,+i,...,i„>e(z+)"-'! 2 i=i 

where a' = aMf + bMi + c, b' = —2aMiri — br^, and d = ar\. Since gd-^-d ^ (^g, 1), 
we have for some di and d 2 dependent on q and rf 


OO 

E h''-"’)’ 

i=l 

OO 

i=l 

OO 

E'd"'''"’)’ 

i=l 


1 

q{ri-l) _ 

di; 

d2- 
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We complete the proof of the claim with 


Re{q,Me) = a- 


in-i) - 1 


b di c d 2 


= {aMj + bMg + c) 


qirt-l) _ I 


+ {—2aMiri — br£)di + {ar‘f)d2 


+ 


q(ri-l) _ X 

1 
c- 


Mf + 


- 1 - 


Me 


bredi + arjd2 


q{re-l) _ X 

To complete the proof of the theorem, apply the claim io ^ = n and let M —)■ cx). 


E(hom(l^,WM)) = + 


Therefore, 




Li ghfc-i) - x‘ 


l{V,q) = lim EidiV, Wm)) 

M^oq 


< lim 


M^oo 


E(hom(l^,WM)) 


ngh-D.r 


□ 


4.3 Density of Doubled Words 

Our main dichotomy says that the average density of a doubled word in large random 
words (over a fixed alphabet with at least two letters) goes to 0. Thus the expected 
number of instances in a random word of length n is o(n^). Perhaps we can hud 
lower-order asymptotics for the expected number of instances of a doubled word. 
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Hencefore, if is used with nonintegral x, we mean 

/x\ nf=o(a^-f) 

\y) y'- 

Proposition 4.15 (Cooper and Rorabaugh 2015+, Proposition 4.1). For k G Z"*" and 
f = (ri,..., r/c) G (Z+)^, let an(r) be the number of k-tuples a = (ai, • • ■ , a^) G (Z+)^ 
so that = n. Then Onir) < , where d = gcdjg[;i,](ri). 

Proof. If d / n, then a„(f) = 0. Otherwise, for each a counted by a„(r), there is 
a unique corresponding b G (Z+)*^ such that 1 < hi < h 2 <■■■< bk = n/d and 
bj = 2 Si=i ORi- The number of strictly increasing fc-tuples of positive integers with 
largest value n/d is □ 

Fix integer g > 0. The number of instances of V in [g]"" is g"'I„(I/, g). Assume V is 
doubled. Let F = L(I/) = {xi,... ,Xk} and ri be the multiplicity of Xj in V for each 
i G [k]. Let d = gcdjg[;.](rj) and r = minjg[fc](rj). Note that I„(V,g) = 0 when d / n. 
But perhaps 

gTl 

lim In{V,q) 

"?„“/(») 

exists for some function / that only depends on g and V. For inspiration, note that 
q^In{U'^,q) = 1 Z) wlien m \ n. Furthermore, using Proposition 4.15, 

g’^I„(I/,g) < E(hom(I/,IT0) < 

Now select some letter x of multiplicity r and let U he V with all copies of x 
removed. When r|(n — \U\), we can get a lower bound on the number of instances by 
counting homomorphism 0 with |0(t/)| = \U\ = \V\ — r: 

g”I„(R,g) > gC-IC)A+(fc-i) = ^ 4 ^ 2 ) 

Conjecture 4.16 (Cooper and Rorabaugh 2015+, Conjecture 4.2). For q G Z"*", the 
following limit exists: 

d\n 
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By (4.2), the limit (if it exists) cannot be 0. Theorem 4.11 is a special case of this 


conjectnre, with d = r = 1. 

4.4 Concentration 

For donbled V and g > 2, we established that the expectation of the density 6(y, Wn) 
converges to zero. What is the concentration of the distribution of this density? By 
(4.1), we can bound the probability that randomly chosen Wn G [g]” is a C-instance: 



From this observation we get the following probabilistic result (which is only inter¬ 
esting for g, r > 1). 

Lemma 4.17 (Cooper and Rorabaugh 20154-, Lemma 5.1). Let V be a word with 
k distinct letters, each occurring at least r G times. Let Wn G [g]” he chosen 


uniformly at random. Recall that Wn) is the number substrings ofWn that 

are V-instances. Then for any nondecreasing function f{n) > 0, 



Proof. Since 6suriV,W) G {0,1}, 


L/(^)J n-m 


^sur{V, Wn[i, i + m]) < n- f{n). 


m=l £=0 


Therefore, 



n n—ra 


< E E P(^-r-(R,Wn[£,£ + m]) >0) 


™=r/(^)i ^=0 


n 


= ^ {n-m + l)¥{5sur{V,Wm) = l) 


m=r/(n)] 
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< 


< 

< 


(n-m + l) 

--[f(n)} V k + 1 J 


m=\f{n)] 


n 


^(n/d + k + 1 \ /(„)(l-r)/r 

k + 1 r 


^k+^qf{n){l-r)/r ^ 


□ 


Theorem 4.18 (Cooper and Rorabaugh 2015+, Theorem 5.2). Let V be a doubled 
word, q>2, and Wn G [g]"" chosen uniformly at random. Then for p G Z"*", the p-th 
raw moment and the p-th central moment of 6{V, Wn) are both O {{\og{n)/n)^). 


Proof. Let us use Lemma 4.17 to first bound the p-th raw moments for 5{y,Wn), 
assuming r > 2. To minimize our bound, we dehne the following function on n, 
which acts as a threshold for “short” substrings of a random length-n word: 


Sp{n) = -^log (n P+^+p)) =tplog n, 


1 — r 


where U = Ws+LpL > q. 


r—1 


[n-Sp(n)J 




(’^r) 


< 


" I /ri+lX 

i=rn-Sp(n)] \V 2 

P 

k+5^Sp(n){l-r)/r 


n 


Spin) 
n+l 
2 


- 1 - 


ntp loggn \ . ^^^_(fc+5+p)^ 

. {+) I 


= Or, 


logn 


n 
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Settings = 1, E„ = E((5(y, Wn)) < (clogn)/n for some large c. We use this upper 


bound on the expectation (1st raw moment) to bound the central moments. 


E{\6{V,Wn)-Er. 


("■) 


Ln-Sp(n)J / 

< 5 : F\S{V,Wn) = 


i=0 


("f) 

p|<5(v;»y = 


_E 

(”r) 

^ clognV 
k n ) 




i=\nsp{ny\ 


("f) 


( 1 )' 


< 


' c log n 


p 


k+5 Sp(n)(l—r)/r 


n 




= Or, 


logn 


n 


□ 


Corollary 4.19 (Cooper and Rorabaugh 2015+, Corollary 5.3). Let V be a doubled 
word, q >2, and Wn € [q]"^ chosen uniformly at random. Then 

-<E((5(R,W„))<^. 

n n 

Proof. The upper bound was stated explicitly in the proof of Theorem 4.18. The lower 
bound follows from an observation in Section 1.6: “the event that 1T„[6|R|, (6+ 1)|R|] 
is an instance of V has nonzero probability and is independent for distinct b G N.” 
Hence 

H-;)) > jf. 

□ 


n 

M 


V|(R,g) = fl(n-^) 


The bound that Theorem 4.18 gives on the variance (2nd central moment) is not 
very interesting. However, we obtain nontrivial concentration using covariance and 
the fact that most “short” substrings in a word do not overlap. 
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Theorem 4.20 (Cooper and Rorabaugh 2015+, Theorem 5.4). Let V he a doubled 
word, q >2, and Wn € [q]"^ chosen uniformly at random. 

Var(5(R, hT„)) = O fE(<5(R, hhO)' 

y n 

Proof. Let Xn = be the random variable connting the nnmber of 

substrings of Wn that are R-instances. For hxed n, let Xa,b be the indicator variable 
for the event that Wn[a,h] is a R-instance, so = J2a=oYlb=a+i^a,b- We use 
(a, b) ~ (c, d) to denote that [a, b] and [c, d\ overlap. Note that 



Cov{Xa,h,Xc^d) < ^{^a,bXc,d) 

< min(E(Xa,b),E(R:c,d)) 

= min(I(b_a)(R,g),I(d_c)(R,g)), 

and for i G {6 — a, d — c}, 

For i < n/3, the number of intervals in Wn of length at most i that overlap a hxed 
interval of length i is less than (^ 2 *)- Let s{n) = So{n) = tolog^^n as dehned in 
Theorem 4.18. For sufficiently large n, 


Var(X„) = CoY{Xa,b,X,^a) 


0<a<b<n 

0<c<d<n 


< min(I(b_a)(R,g),I(6_a)(R,g)) 

(a,6)~(c,(i) 


E + E 

(a,6)~(c,d) (a,6)~(c,(i) 

_b—a,d—c<s{n) else 

hWJ /q,-\ 

< 2 + 


mm 


(I(6_a)(R,g),I(fe_a)(R,g)) 


/ I 1 fi/dpkpl\ i(i_n)/r 
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< 2s{n)n{3s{n)y + 
= 18 (to logg n)^n + 

= 0(n(logn)^). 


Since IE((5(V, Wn)) = ^{n by Corollary 4.19, 


Var(<5(C,iy„)) 


Var 



Var(X^) 

br)' 


o 


/ (logn)3\ 
n3 j 


OW{V,Wn)) 


2 (l ogn)^ ^ 
n 


□ 


Question 4.21 (Cooper and Rorabaugh 2015+, Question 5.5). For nondouhled word 
V, what is the concentration of the density distribution ofV in random words? 
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Chapter 5 


Asymptotic Probability of Being Zimin 


In Chapter 2, we investigated bonnds on the length of words that avoid Zimin words. 
In snbseqnent chapters, we proceeded to develop the theory of word densities, some 
of which applies to Zimin words. 

We proved in Chapter 4 that the asymptotic instance probability of V in g-ary 
words, 1(1/, q) = lim^^oo In(E, q), exists for any word V, and is eqnal to the asymptotic 
expected density of V in random words. We also proved the following dichotomy for 
q > 2 (Theorem 4.4): 1(1/, g) = 0 if and only if V is donbled (that is, every letter in V 
occurs at least twice). Trivially, if V is composed of k distinct, nonrecurring letters, 
then In{V, [g]) = 1 for n > k, so 1(1/, g) = 1. But if V contains at least one recurring 
letter, it becomes a nontrivial task to compute 1(1/, g). 


Corollary 5.1. Forn,q E 


n—1 


q 


=1 q 




(2J-1) 


Proof. For the lower bound, note that \\Zn\\ = \Zn\ — |L(Z„)| = (2” — 1) — {n). 
Theorem 4.11 tells us that for all g E IP' and nondoubled V, I(E, g) > g“ITII_ 

For the upper bound, observe that the n letters occurring in have multiplicities 
{vj = 2Y 0 < j < n). Since there is exactly one nonrecurring letter in tq = 2^ = 1, 


Theorem 4.14 provides an upper bound of n?=i 


n—1 1 

i=i 


□ 


A nice property of these bounds is that they are asymptotically equivalent as 
g —)■ cxo. For some specihc V, we can do better. In this chapter, we provide inhnite 
series for computing the asymptotic instance probability I(E, g) for two Zimin words. 
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V = Z 2 = aba (Section 5.1) and V = = abacaba (Section 5.2). Table 5.1 below 

gives numerical approximations for 2 < g < 6. Our method also provides bounds on 
I{Zn,q) for general n (Section 5.3). 


Table 5.1 Approximate values of I(.^ 2 , q) and I^Z^, g) for 2 < g < 6. 


q 

2 

3 

4 

5 

6 


KZ 2 ,q) 

0.7322132 

0.4430202 

0.3122520 

0.2399355 

0.1944229 


KZ3,q) 

0.1194437 

0.0183514 

0.0051925 

0.0019974 

0.0009253 



5.1 Calculating I { Z 2 , q ) 

Let a£ = a^/'^ be the number of bihx-free g-ary strings of length i. For g = 2, this 
is sequence oeis.org/A003000; for g = 3, oeis.org/A019308 (OEIS Foundation Inc. 
2011 ). 

Lemma 5.2. If word W has a bifix, then it has a bifix of length at most [|IF|/2J. 

Proof. Let VF be a word with minimal-length bihx of length k, [|IF|/2J < k < \W\. 
Then we can write W = W 1 W 2 W 3 where W 1 W 2 = W 2 W 3 and |hFiIF 2 | = k = |lT 2 hF 3 |. 
But then W has bihx W 2 with |fF 2 | < h, which contradicts our selection of the 
shortest bihx of IF. □ 

Lemma 5.3. a£ = a!f^ has the following recursive definition: 


Oq 

QjI 

a2k 

a2k+l 


0 ; 

q-, 

qa2k-i — ak', 
qa2k- 


2 


Proof. Fix a g-letter alphabet. Let W = UV be a bihx-free word with \U\ = 
and \V\ = ^J. Suppose UaV has a bihx for some letter a. Then by the lemma, 


58 














UaV has a bifix is of length at most \UaV\/2. But W is bifix free, so the only 
possibility is U = aV. 

Therefore, for every bifix-free word of length 2k there are q bifix-free words of 
length 2k + 1. For every bifix-free word of length 2k — 1, there are q bifix-free words 
of length 2 k, with exception of the the length-2fc words that are the square of a 
bifix-free word of length k. □ 


Theorem 5.4. For q > 2, 


HZ2,q) 




Proof. Since counts bifix-free words, the number of g-ary words of length 

M that are Z 2 -instances is (without double-count) 

\M/2\-l 

M-2e 

aiq , 

i=0 


so the proportion of g-ary words of length M that are Z 2 -instances is 


1 \M/2\-1 \M/2\-1 

^ m - 2. ^ 
y £=0 £=0 y 


21 ' 


Therefore 1(^2, g) = /(1/g^), where f{x) = is the generating function for 

CXD 

/(^) = 

£=0 

From the recursive definition of ai, we obtain the functional equation 


f(x) =qx + qxf{x) - f{x^). 


(5.1) 


Solving (5.1) for /(x) gives 


fix) 


qx — /(x^) 
1 — gx 


^ (-l)^gx^^ 

i=o nLo(l-?a;2") 


□ 
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Corollary 5.5. For q >2: 


1 

q 


< 1(^2, g) < 





Moreover, as q ^ oo, 


1(^2, g) 


^_ l + o(l) 

g — 1 g3 


Proof. The lower bound follows from the fact that a word of length M > 2 is a 
Z 2 -instance when the hrst and last character are the same. This occurrence has 
probability 1/q. Note that f^‘^\q~‘^) is an alternating series. Moreover, the terms in 
absolute value are monotonically approaching 0; the routine proof of monotonicity can 
be found in the appendices (Lemma C.l). Hence, the partial sums provide successively 
better upper and lower bounds: 



1/g l/q^ 


1-1/g (l-l/g)(l-l/g3) 

^_ l + o(l) _ 

g — 1 g3 



1 l + o(l) l/g3 

9-1 r ^(1 -i/9)(i-lA^xi- 1 / 9 '') 

1 1 + 0(1) ^ 0 ( 1 )^ 

q — 1 g3 


□ 
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Table 5.2 Approximate values of I(Z 2 , g) for 2 < g < 8. 


q 

2 

3 

4 

5 

6 

7 

8 

q-^ 

0.50000 

.33333 

.25000 

.20000 

.16667 

.14286 

.12500 

1 (^ 2 , g) 

0.73221 

.44302 

.31225 

.23994 

.19442 

.16326 

.14062 

1 

1 

1 

CO 

0.87500 

.46296 

.31771 

.24200 

.19537 

.16375 

.14090 

1 

1 

1.00000 

.50000 

.33333 

.25000 

.20000 

.16667 

.14286 


5.2 Calculating 1(^3, g) 

Will use similar methods to compute 1(^3, g). To avoid unnecessary subscripts and 
superscripts, assume throughout this section that we are using a hxed alphabet with 
g > 1 letters, unless explicitly stated otherwise. Since Z 2 has more interesting struc¬ 
ture than Zi, there are more cases to consider in developing the necessary recursion. 

Lemma 5.6. Fix bifix-free word L. Let W = LAL be a Z 2 -instance with a Z 2 -bifix. 
Then LAL ean be written in exactly one of the following ways: 

{i) LAL = LBLCLBL with LBL the shortest Z 2 -bifix ofW and \C\ > 0; 

(u) LAL = LBLLBL with LBL the shortest Z 2 -bifix ofW; 

(Hi) LAL = LBLBL with LBL the shortest Z 2 -bifix ofW; 

{iv) LAL = LLFLLFLL with LLFLL the shortest Z 2 -bifix ofW; 

{v) LAL = LLLL. 

Proof. With some thought, the reader should recognize that the hve listed cases are 
in fact mutually exclusive. The proof that these are the only possibilities follows. 

Given that W has a Z 2 -bi£x and L is bihx-free, it follows that W has a Z 2 -bihx 
LBL for some nonempty B. Let LBL be chosen of minimal length. We break this 
proof into nine cases depending on the lengths of L and LBL (Figure 5.1). Set 
m = |W|, I = |L|, and k = \LBL\. 
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Figure 5.1 All possible ways the minimal Z 2 -bifix of W can overlap, with 
m = \W\J= |L|, and k = \LBL\ 


Case (1): 2k < m. This is {i). 

Case (2): 2k = m. This is {ii). 

Case (3): m < 2k < m-\- i. In LAL, the first and last occurrences of LBL overlap by 
a length strictly between 0 and i. This is impossible, since L is bifix-free. 

Case (4): 2k = m + 1. This is {in) 

Case (5): m + i < 2k <m + 2i. The hrst and last occurrences of LBL overlap by a 
length strictly between i and 2i. This is impossible, since L is bihx-free. 

Case (6): m + 2i = 2k < 2{m — £). LAL = L{DL){LE)L where DL = B = LE. 

Thus L is a bihx of B, so LAL = LLELLELL where B = LEL. If |F| > 0, 
this is (iv). If |F| = 0, then LAL = LLLLLL. But this contradicts the 
minimality of LBL, since LLLLLL has Z 2 -bi£x LLL, which is shorter than 


LBL = LLLL. 
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Case (7): m + 2i < 2k < 2{m — tj. LAL = LDLELD'L where DLE = B = ELD'. 

Since EL is a prefix of B, LEL is a prefix of LAL. Likewise, since LE is a 
suffix of -B, LEL is a suffix of LAL. Therefore, LEL is a bifix of LAL and 
\LEL\ < \LDLEL\ = \LBL\, contradicting the minimality of LBL. 

Case (8): k = m — i. LAL = LLCLL where LC = B = CL. If \C\ = 0, this is {v). 
Otherwise, LCL is a bifix of LAL, contradicting the minimality of LBL. 

Case (9): m — i < k < m. The first and last occurrences of LBL overlap by a length 
strictly between k — i and k. This is impossible, since L is bifix-free. 


□ 

For fixed bifix-free word L of length i, define 6^ to count the number of Z 2 words 
with bifix L that are Z 2 -bifix-free q-aij words of length m. Then 

00/00 \ 

I(Z3.9) = E (5-2) 

£=1 V m=l / 

In order to form a recursive definition of bn as we did for a^, we now describe two 
new terms. Let AB be a word of length W with l^l = |'IF/2"| and \B\ = [IF/2J. 
Then AB has q length-(n-l-l) children of the form AxB, each having AB as its parent. 
In this way every nonempty word has exactly q children and exactly 1 parent, which 
establishes the l:q ratio of words of length n to words of length n + 1. The set of a 
word’s children together with successive generations of progeny we refer to as that 
word’s descendants. 

Theorem 5.7. 6^ = -|- where Cn = and dn = are defined recursively as 

follows: 

For even i : 

Cl = ■■■ = C2e = 0 , 

C2e+i = q, 
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Cu 
Cbt 
C5£+l 
CU 
C2k 
C2k+1 
di = ■ ■ ■ = du 
du+1 
du 
dbe+i 
du 
d 2 k 
d2k+l 
For odd i > 1 : 

Cl = ■■■= C 2 £ 
C2e+i 
C4£ 
C5£ 
C5£+l 
Cu 

C2k 

C2k+l 
di = ■ ■ ■ = du 
dA£+l 


9C4£-1 ~ (c5£/2 + 1), 

Q'C5£-1 — (c5£/2 + C‘i£ — 1), 
g(c5£ + C3£ — 1), 
qcQ£-i — (c 3 £ — 1 + 05^/2); 

qc 2 k-i - (Cfc + Ck+£/ 2 ) for k> i,k ^ {2i, 5£/2, 3i}, 
q{c 2 k + Ck+£/ 2 ) for k> i,kj^ 5£/2, 

0 , 

q, 

qd^£-i - 1, 

q{dbi + 1 ), 

qd%£-i — 1 , 

qd 2 k-i - {dk + dk+£ + dk+£/ 2 ) for k>2i,k ^ {5^/2, 3£}, 
q{d 2 k + dk+t + dk+£/ 2 ) for k> 2 i,kj^ U/ 2 . 


0 , 




q {^U-l + ~ (^ 2 £ + 1); 

qC5£-l — (C3£ — 1), 

q{,C5£ + C3£ — 1) — C|-M-| , 

q (^Cg£-i + - (C3£ - 1), 

q {c2k-i + C^+ ^ - Cfc; /c> /c ^ |2£, 


,3£ , 


qc2k - > i,k^ 

0, 


5£ 

y 
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du 

= qd^i-i — 1 , 

du+i 

= q{d5e + 1 )) 

du 

= qdei-i — 1 , 

d2k 

= q (^d 2 k-i P — {dk + dk+e)] k 

d2k+l 

= q {d 2 k + dk+t) - d^^ [f] y > 2 ^) ^ 7^ 

For £ = 1 : 


Cl = Cl = C 2 

= 0 , 

C3 

= q, 

C4 

= qc3 - 1 , 

C5 

= gC4 - (C 3 - 1), 

C 6 

= qiCb + C 3 — 1 ) — (C 3 — 1 ), 

C2k 

= q{c2k-l + Cfc) — Ck]k > 3, 

C2k+1 

= qc 2 k - Ck+i; k > 2; 


5i 

~2 


di — C?2 — 

de 

d2k 

d2k+l 


0 , 

g - 1, 

q{d^ + 1 ) — 1 , 

Q{d2k-i + dk) — {dk + (ifc+i); k > 3, 
q{d2k + dk+i) - dk+i] k>2. 


5i 

y 



5 


Proof. Fix a bifix-free word L of length i. The full recursion is too messy to prove all 
at once, so we build up to it in stages. Within each stage, ~ indicates an incomplete 
dehnition. Example word trees with small q and short L are found in Appendix D. 

Stage I 

Since L is bihx free, any Z 2 -instance with L as a bihx has to be of greater length 


65 






than 2£. Thus, bi = ■ ■ ■ = b 2 e = 0- The only such words of length 2£ + 1 are of the 
form LxL for some letter x, therefore, b 2 e+i = q. 

Every word of length n > 2^ + 1 has L as a bihx if and only if its parent has L 
as a bihx. This is why, for k > i, the dehnition of b2k includes the term qb2k-i, and 
the dehnition of 62^+1 includes the term qb2k- If bn were counting Z 2 -instances with 
bihx L, we would be done. However, we do not want bn to count words that have a 
Z 2 -bihx. Thus, we must deal with each of the 5 cases listed in Lemma 5.6. 

First, let us deal with case (ii): LAL = LBLLBL with LBL the shortest Z 2 -bihx 
of LAL. The number of these of length 2k [k > £) is bk- Therefore, in the dehnition 
of b2k, we subtract bk- Conveniently, the descendants of case-(u) words are precisely 
words of case (i). Therefore, we have accounted for two cases at once. 

Next, let us look at case (Hi): LAL = LB LBL with LBL the shortest Z 2 -bihx of 
LAL. For the moment, assume \L\ = i is even. Then \LBLBL\ is even. The number 
of such words of length 2k {k > £) is bk+e/ 2 - We want to exclude words of this form, 
but we do not necessarily want to exclude their children. Therefore, in the dehnition 
of b2k we subtract bk+t/2, but then we add qbk+i/2 in the dehnition of 62 ^+ 1 - 

Now we look at when \L\ is odd, so \LBLBL\ is odd. The number of such 
words of length 2k + 1 {k > i) is &fc+p/ 2 ]- Therefore, in the dehnition of 62^+1 
we subtract bk+\i/ 2 ], but then we add qb(^k-i)+\i/ 2 ] = qbk+\i/ 2 \ in the dehnition of 
^(2{fc-l)+l)+l = b2k- 

Our work so far renders the following tentative dehnition of bn. 


For even i : 


'1 = ■ ■ ■ = &2£ 

= 0, 

b 2 £+l 

= q, 

b 2 k 

~ qb2k-i - {bk + bk+e/ 2 ) for k> i, 

b 2 k+l 

~ q{b2k + bk+e/ 2 ) for k> i. 
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For odd t : 


bi — ■■■ — b2i — 0 , 


b2i+i = q, 

b2k ~ q{b2k-i + bk+[e/2i) - bk for k > i, 
b2k+i ~ qb2k - bk+\e/2-] for k > i. 


We continue with case (iv): LAL = LLFLLFLL with LLFLL the shortest Z2- 
bihx of LAL. Note that \LLFLLFLL\ is even. It would apear that the number 
of such words of length 2 k would be bk-E (counting words of the form LFL), which 
we could deal with in the same fashion as we did for case (Hi). However, when 
counting words of the form LFL., we do not want words of the form LLGLL, because 
LLFLLFLL = LLLGLLLLGLLL is already accounted for in case (i). 

Stage II 

To address this issue, we will dehne two different recursions. Let dn count the Z2- 
instances of the form LLALL that are Z2-bi£x free. Let Cn count all other Z2-instances 
of the form LAL that are Z2-bifix free. Therefore, 6n = c„ + dn by definition. 

As with bn, we quickly see that c„ = 0 for n < 2 i and C2£+i = q. Now the shortest 
words counted by dn are of the form LLxLL for some letter x, so dn = 0 for n < M 
and d^i+i = q. 

To deal with cases (i) and (ii), we can do the same things as before, but recognizing 
that LL is a bifix of LBLLBL if and only if LL is a bifix of LBL. Therefore, subtract 
Ck in the dehnition of C2k and subtract dk in the definition of d2k (both for k > i). 

We also deal with case (Hi) as before, recognizing that LL is a bifix of LBLBL if 
and only if LL is a bifix of LBL. For even t. subtract Ck+112 in the definition of C2k 
and add qCk+112 in the definition of C2k+i', subtract dk+ei2 in the definition of d2k and 
add qdk+£/2 in the definition of d2k+i- For odd i: subtract Ck+\£/2] in the dehnition of 
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C2k+i and add qCk+[e/2\ in the definition of C2k', subtract dk+\e/2] in the definition of 
d2k+i and add qdk+ie/2\ in the definition of d2k- 

Having split bn into Cn and dn, we can address case (iv): LAL = LLFLLFLL 
with LLFLL the shortest Z2-bifix of LAL. These words are counted by dn, not by 
Cn, and there are dk+e such words of length 2 k. Therefore, we subtract dk+i in the 
definition of d2k and add qdk+e in the definition of d2k+i- 

This brings us to the following tentative definitions of Cn and dn- 

For even i : 

Cl = ■■■ = C 2 e = 0 , 

C2£+l = q, 

C2k ~ qc2k-i — {ck + Ck+e/ 2 ), 

C2k+1 ~ q{c2k F Ck+i/2)', 

di = ■ ■ ■ = d/i^ = 0 , 

du+i = q, 

d 2 k ~ qd 2 k-i — {dk + dk+£ + dk+t/2), 
d2k+l ~ + dk+£ + dk+£/2)- 

For odd i : 

Cl = ■ ■ ■ = C 2 e = 0 , 

C2£+i = q, 

C 2 k ~ <l{c 2 k-l + Cfc+L£/ 2 j) — Ck, 

C 2 k+l ~ (lC 2 k — Ck+\£/ 2 y, 
di = ■ ■ ■ = d/n = 0 , 

dA£+i ~ q, 

d2k ~ q{d2k-i + dk+i£/2\) — {dk + dk+£), 
d2k+i ~ q{d2k + dk+£) — dk+ie/2]- 
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stage III 


Next, let us deal with case (v): LLLL. We merely need to subtract 1 in the dehnition 
of C 4 £. Since all of the words counted by dn are descendants of LLLL, this is what 
prevents overlap of the words counted by c„ and dn- 

There was a small omission in the previous stage. When dealing with cases {i) 
and (ii), we pointed out that LL is a bihx of LBLLBL if and only if LL is a bihx of 
LBL, this was a true and important observation. The one problem is that LLL has 
LL as a bihx but is not of the form LLALL. Therefore, LLL LLL was “removed” in 
the dehnition of when it should have been “removed” from dg^. We must account 
for this by adding 1 in the dehnition of cqi and subtracting 1 in the dehnition of dg^. 

Similarly, in dealing with case (Hi), we “removed” LLLLL in the dehnition of 
and “replaced” its children in the dehnition of Cg^+i. These should have happened to 
dn- Therefore, we add 1 and subtract q in the dehnitions of cg^ and cg^+i, respectively, 
then subtract 1 and add q in the dehnitions of d^i and dg^+i, respectively. 

Since LLL does not cause any trouble with case (iv), we are done building the 
recursive dehnition for even i as found in the theorem statement. 

Stage IV 

The recursion for odd i has the additional caveat that £ ^ 1- When £ = 1, there exist 
conhicts in the recursive dehnitions: 4£ + 1 = 5£ and 5£ + 1 = Q£- After consolidating 
the“adjustments” for these cases, we get the dehnition for £ = 1 as appears in the 
theorem statement. □ 

With our recursively dehned sequences and bn, the latter in terms of and 
dn, we are now able to formulate Theorem 5.4 for Z^. 

Theorem 5.8. For integers q > 2, 

CO / oo 

1 (^ 3 , 9 ) = 

i=l \i =0 
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where 


G{;,) = G<«>(i) 

r{x) = r'f\x) 
s(a:) = s‘f\x) 

H(i) = 

u{x) = u^/\x) 
v(x) = Vg'^\x) 


nUo (i - 

1—/ —/' 

1 — gx + X ; 

nu (i - 

gx^'+i - x^'^ + gx^'+' - X®'; 

1 — qx^~^ + x~^ — qx^~‘^^ + x“^^. 


Proof. Recalling Equation (5.2), 

oo / oo \ 

1(^3.9) = E 

^=1 \ m=l / 

oo / oo \ 

= E + . 

£=1 \ m=l / 

Similar to our proof for 1(^2, g), let us define generating functions for the sequences 
Cn = and dn = 

OO OO 

g(x) = g^f‘\x) = '^Cnx"' and h(x) = hf\x) = d^x’^. 

z=l 

Despite having to write the recursive relations three different ways, depending on 
£, the underlying recursion is fundamentally the same and results in the following 
functional equations: 


9{x) 

= q (xg(x) + x^“^g(x^) + 

(5.3) 


— (^g(x^) + x“^g(x^) + x^^ — x®^ — x®^) ; 


h{x) 

= g (xh{x) + x^~‘^^h{x^) + x^~^h{x^) + x^^’*'^ + x®^’*'^ j 

(5.4) 


— {h{x^) + x“^^h(x^) + x~^h{x^) + x®^ + x®^) . 



Solving (5.3) for g(x), we get 


9{x) 


r(x) — s(x)g(x^) 
1 — gx 


(5.5) 
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with r{x) and s{x) as defined in the theorem statement. Expanding (5.5) gives 


9{x) 


r{x) — s{x)g{x‘^) 



1 — qx 


1 ■ 5 ( 3 ^) / 

1 - 

r[x) ) 

s{x) r{x‘^) — s{x'^)g{x'^) 


r{x) 1 — qx 
s{x) r(a;^) 


r{x) 1 — qx‘^ 


1 - 

r{x^) 


h nUo (1 - 

Likewise, solving (5.4) for h{x), we get 

u(x) — v(x)h(x‘^) 

h{x) = -^- 

1 — qx 

h nuo (1 - 

with u(x) and v(x) as defined in the theorem statement. 


(5.6) 


(5.7) 

(5.8) 
□ 


Corollary 5.9. For integers N > 0 and M > 0, 

N /2M+1 \ 

E“d L (GW + 77W) < HZi.i); 

e=i \ i=o / 


N /2M \ 

1(23.9) < 9""+i:<i( ■ 

e=i \i=o / 


with G{i) = G^/\i) and H{i) = as defined in Theorem 5.8. 


Proof. For fixed integers q>2 and £ > 1, + H{i)) is an alternating series. 

We need to show that the sequence |G( 2 ) + H{i)\ is decreasing. Since (—l)*G(i) > 0 
and {—iyH{i) > 0 for each i, \G{i) +H{i)\ = |G(i)| + \H{i)\. Thus it suffices to show 
that {|C(i)|}“^ and {|Lf(*)|}^i are both decreasing sequences, the routine proof of 
which can be found in the appendices (Lemma C.2). 
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Now for any integer M > 0: 

2M+1 oo 2M 

■£ G,(i) + H,ii} < x: 

2 = 0 171=0 2=0 

Moreover, since the are nonnegative, the lower bound for the theorem is evident. 
For a bihx-free word L of length £, ^^e limit, as M —)■ cxo, of the 

probability that a word of length M is a Za-instance of the form LALBLAL. A 
necessary condition for such a word is that it starts and ends with L, which (for 
M > 2t) has probability Also at counts the number of bihx-free words of length 
£, so ai < q^. Hence for any integer A^ > 0: 

N OO OO 

i=l m=0 i=N-\-l 

N oo oo 

i=l m=0 i=N-\-l 

N oo 

l=\ m=0 

□ 


Table 5.3 Approximate values of I^Z^, g) for 2 < g < 6. 


Q 

2 

3 

4 

5 

6 

KZs,q) 

0.11944370 

0.01835140 

0.00519251 

0.00199739 

0.00092532 


The values in Table 5.3 were generated by the Sage code found in Appendix C.2, 
which was derived directly from Corollary 5.9 and can be used to compute I(.^ 3 , g) 
to arbitrary precision for any g > 2. 

5.3 Bounding I(Z„,g) for Arbitrary n 

This programme is not practical for n in general. The number of cases for a gen¬ 
eralization of Lemma 3.1 is likely to grow with n. Even if that stabilizes somehow, 
the expression for calculating I(Z„, g) requires n nested inhnite series. Nevertheless, 
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ignoring some of the more subtle details, we proceed with this method to obtain a 
not-overly-messy way to calculate bounds for I(Zn, g) in general. 

Fix a Z„_i-instance L of length £ > 1, let be the number of words of length m 
of the form LAL for e but not of the form LBLBL. That is, b is an overcount 
for the number of Z„-instances of the form LAL. Then bm = is recursively dehned 
as follows: 


For even i : 
bo = ■■■ = b2e 
b2k 
b2k+l 
For odd £ : 
bo = ■■■ = b2i 
b2k 
b2k+l 


0 , 

qhk-i - {h + h+i/ 2 ) for k> £, 
q{b 2 k + h+e/ 2 ) for k> £. 


0 , 

qihk-i + bk+[e/2i) - h for k > £, 
qhk - h+\e/2] for k > £. 


The the associated generating function f{x) := fj{x) = Y.m=i b^x^ satishes 


f{x) = q{x + + xf{x) + X ^f{x )) - {f{x ) + X ^f{x )). 

Therefore, setting t{x) = t‘f\x) = 1 — qx^~^ + x~^, 

q^‘ 2 W _ t{x)f{x‘^) 


fix) = 


1 — qx 

(_i)*^(2p(2£+i) 


= g ■ ^ / 

nuo (1 - 


i=0 


Now fiq~^) gives an upper bound for the limit (as word-length approaches inhnity) 
of the probability that a word is a Z„-instance of the form LAL. We can write the 
following expressions as upper bounds for I(Z„, q)\ 


00 00 00 


q) < 




£0=1 £n = l’^=l 
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Ni Nn oo 

I(^n, q) < J2---J2J2 

£o=l £n = l'fn=^ 

oo 

+ n J2 ?“^- 

e=Ni+i 


A more precise recursion can be attained by extensive case-work, but the improve¬ 
ment gained is likely not worth the effort. 
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Chapter 6 


Future Directions 


6.1 Word Densities 

6.1.1 Limit Factor Densities 

We saw in our density comparison of Section 3.1.1 that the limsup factor density of 
is 1 for any q,k E Zi+. However, this is not the case for words with at least two 
distinct letters. Generating functions or the de Bruijn graph may provide great tools 
for answering the following question. 

Question 6.1. For q >2 and V with at least two distinct letters, what is 

lim supd(U, IT)? 

we[q]" 

\W\^oo 

6.1.2 Density Comparisons 

The plots of possible Z 2 - and Zs-densities in short binary words (Figure 3.2) suggests 
a nonlinear asymptotic lower bound for 6 {Z 5 ,W) in terms of 6 {Z 2 ,W). Moreover, 
it is surprising to observe that the minimum Zs-density does not coincide with the 
minimum Z 2 -density. Considering the words (aW)" with n —)■ 00 , we see that the 
absolute upper bound of ?/ = x is asymptotically tight, at least for x = 

Question 6.2. Over a fixed alphabet, what is the asymptotic lower bound for 6 {Z^, IT) 
in terms of d(Z 2 , IT) ? 
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6.1.3 Encounter Enumeration 


Given a E-instance W, there might be multiple homomorphisms on L(E)* that pro¬ 
duce W. For this reason, the number of encounters, hom(E, IF), was only used to 
hnd an upper bound for 5{V,W). However, the quantity ^ is not generally 

expected to be less than 1. The worst-case scenario is with factors of the form a^, for 
which every one of the partitions into \ V\ nonempty substrings gives a unique 

encounter. However, when V has exactly 1 nonrecurring letter, the lower and upper 
bounds on I{V,q) (Theorems 4.11, 4.14) are asympototic in q. So for such V and 
large random IF, E(hom(F, IF)) is a good estimate for E ^))- Yet we 

see from the proof of Lemma 4.8, that if V has multiple nonrecurring letters, we can 
expect numerous homomorphisms for a given instance. 


Question 6.3. Fixed q>2. Assuming a uniformly random selection ofWn G [g]", let 
hom„^s„r(F, g) be the expected number of nonerasing homomorphisms 0 : L(F)* —)• [g]* 
such that that 4>{V) = IFn. If V has exactly k nonrecurring letters, what is the 
asymptotic growth of 

homn^surjV, q) 

In(E,g) 

in terms of n, k, and q? 


6.1.4 Abelian Encounters 

In Problem (H.2) of a list of unsolved problems, Erdos (1961) suggested that ‘perhaps 
an inhnite sequence of four symbols can be formed without consecutive “identical” 
[factors]’ where two word are “identical” provided ‘each symbol occurs the same 
number of times in both of them (i.e., we disregard order).’ For a summary of the 
history of this problem by Erdos, through its positive answer by Dekking (1979), see 
Section 5.3 of Berstel et ah (2008). This appears to be the hrst consideration of what 
are now called Abelian encounters. 
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Definition 6.4. Word W is an Abelian l/-instance for word V = 0102 ■ ■ ■ ctn provided 
W = A 1 A 2 ■ ■ ■ An for nonempty words Ai such that Ai and Aj are anagrams whenever 
tti = aj. W encounters V in an Abelian sense provided some factor ofW is an Abelian 
V-instance. 

Currie (2005) restates and introduces a number of open problems regarding avoid- 
ability in the Abelian sense. It was in response to Currie’s paper that Tao (2014+) 
proved the Abelian variant of Theorem 2.10, with which he established a lower bound 
for Zimin-avoidance. It is perhaps worth reproducing the present density results in 
the Abelian sense. 

6.2 Word Limits 

6.2.1 Convergence 

A driving force of the Graph Limits programme (see Lovasz 2012) is found in the 
various forms of convergence, especially for dense graphs. For example, a sequence 
of graphs with |1Z(G)| —)■ cx) is left-convergent provided the graph densi¬ 

ties t{F, Gn) converge for every finite graph F. There is also a concept of right- 
convergence, convergences via a cut metric convergence of ground state energy 
(from statistical physics), and more. The remarkable fact is that many of these forms 
of convergence are equivalent. 

Now there are multiple ways to define convergence of a sequence of words 
with length \Wn\ —t 00 . One might define convergence in terms of factors: 

• Wn is an initial factor of Wn+i for all n; 

• WA < ffn+i for all n; 

• d{y, Wn) converges for every finite words V; 

• P(l/ is followed by x in Wn) converges for every word-letter pair {V,x). 


77 


Alternatively, convergence could be defined in terms of instances: 


• Vbn+i is an instance of 14A for all n; 

• ffon ^ ffon+i for all n; 

• 6(y, Wn) converges for every finite words V. 

These are clearly not all equivalent, but which ones are? More importantly, which 
ones are productive for a combinatorial limit theory. 

6.2.2 Lexons 

The rigorous theory of convergent graph sequences is crowned by the concept of 
a graphon, the limit object for dense graphs. A graphon is a symmetric function 
w : [0,1]^ —)■ [0,1], and is determined (up to a measure 0 set and application of a 
measure preserving function on [0,1]) by the set of homomorphism densities of graphs 
into it. For example, the triangle-density of w is 

t{K‘i,w) = / w(x,y)w{y, z)w(z, x) dx dy dz. 

Since graphons he in a compact space, various analytic tools can be used to develop 
continuous theory that then applies to associated large graphs. 

Question 6.5. Do there exists limit objeets for free words that lie in some compaet 
spaee. Further, can we define metrics on words that extends productively to the limit 
object? 

For example, if we define convergence to be that “hFn is an initial factor of Wn+i 
for all n,” then the obvious limit object is a right-infinite word. For convergence 
defined as “hFn < Wn+i for all n,” the limit object should be a bi-infinite word. 
However, these particular forms of convergence do not appear sufficiently strong to 
guarantee any form of homomorphism density in the limit object. 
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6.2.3 Randomness 


A foundational result in graph theory is the Szemeredi Regularity Lemma, which 
roughly states that the vertex set of every sufficiently large graph can be partitioned 
so that the edges between parts are “random-like.” Generally quasirandomness is used 
to characterize a sequence of “random-like” graphs. Several of the many equivalent 
definitions of quasirandomness are in terms of the homomorphism densities of graphs. 

Question 6.6. Does there exists a productive definition of quasirandomness for free 
words ? 

Perhaps this would be in terms of factor or instance densities, or perhaps in terms 
of transition probabilities as used in the de Bruijn graph (Section 3.3). 
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Appendix A 


Computations for Zimin Word Avoidance 


A.l All Binary Words that Avoid 


The following 13 words are the only words over the alphabet {0,1} that avoid the 
second Zimin word, Z^ = aba. 


Table A.l Binary words 
that avoid Z 2 . 


00, 

001, 

0011, 

01, 

oil. 


10, 

100, 


11, 

110, 

1100. 
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A.2 Maximum-Length Binary Words that Avoid 


The 48 words in Table A.2 are all the words of length /(3, 2) —1 = 28 over the alphabet 
{0,1} that avoid Z 3 = abacaba. All binary words of length at least /(3,2) = 29 
encounter Z 3 . This result is easily computationally verified by constructing the binary 
tree of words on ( 0 , 1 }, eliminating branches as you find words that encounter Z 3 . 


Table A.2 Maximum-length binary words that avoid Z 3 . 


0010010011011011111100000011, 

0010010011111100000011011011, 

0010010011111101101100000011, 

0010101100110011111100000011, 

0010101111110000001100110011, 

0010101111110011001100000011, 

0011001100101011111100000011, 

0011001100111111000000101011, 

0011001100111111010100000011, 

0011011010010011111100000011, 

0011011011111100000010010011, 

0011011011111100100100000011, 

0011111100000010010011011011 , 

0011111100000010101100110011 , 

0011111100000011001100101011 , 

0011111100000011011010010011 , 

0011111100100100000011011011, 

0011111100100101101100000011, 

0011111100110011000000101011, 

0011111100110011010100000011, 

0011111101010000001100110011, 

0011111101010011001100000011, 

0011111101101100000010010011, 

0011111101101100100100000011 , 


1100000010010011011011111100, 

1100000010010011111101101100, 

1100000010101100110011111100, 

1100000010101111110011001100, 

1100000011001100101011111100, 

1100000011001100111111010100, 

1100000011011010010011111100, 

1100000011011011111100100100, 

1100000011111100100101101100, 

1100000011111100110011010100, 

1100000011111101010011001100, 

1100000011111101101100100100, 

1100100100000011011011111100 , 

1100100100000011111101101100 , 

1100100101101100000011111100 , 

1100110011000000101011111100 , 

1100110011000000111111010100, 

1100110011010100000011111100, 

1101010000001100110011111100, 

1101010000001111110011001100, 

1101010011001100000011111100, 

1101101100000010010011111100, 

1101101100000011111100100100, 

1101101100100100000011111100 . 


85 


A. 3 A Long Binary Word that Avoids Z4 

Figure A.l shows a binary word of length 10482 that avoids Z 4 = abacabadabacaba. 
This implies that /(4, 2) > 10483. The word is presented here as an image with 
each row, consisting of 90 squares, read left to right. Each square, black or white, 
represents a bit. For example, the longest string of black in the hrst row is 14 bits 
long. We cannot have the same bit repeated 15 = IZ 4 I times consecutively, as that 
would be a Z 4 -instance. A string of 14 white bits is found in the 46**^ row. 



Figure A.l A binary word of length 10482 that 
avoids Z 4 = abacabadabacaba. 
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A.4 Verifying Z„-Avoidance 


The code to generate a Z 4 -avoiding word of length 10482 is messy. The following, 
easy-to-validate, inefficient, brute-force. Sage (Stein et ah 2014) code was used for 
verification of the word above. It took roughly 12 hours of computation on an In¬ 


tel® Core™ i5-2450M CPU @ 2.50GIIz X 4. 


Recursive function to test if V is an instance of Z n. 
def inst (V, n ) : 

if len(V) ==0: 

return False 

if n==l: 


return True 

for i in range(2^(n — 1) — 1, c e i 1 (leu (V) / 2 )): 

if V[: i]==V[-i 

if inst(V[:i], n— 1): 
return True 


return False 

W = Paste word here as a string . 

(L, n) = (len(W), 4) 

^ Check every subword V of length at least 2^n — 1. 
for b in range(L -|- 1): 

for a in range(b — (2^n — 1)): 
if inst (W[a:b] , n): 

print a , b , W[ a : b ] 
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Appendix B 


Computational Comparison: 5 { Z 2 , W ) vs. 5 { Z ^, W ) 

Figure B.l below shows plots of all (a:, |/)-pairs with x = 6 {Z 2 j W) and y = ^(Zs, IT) 
for binary words IT G [2]^, where k G {13,16,19,22, 25, 28}. More discussion of these 
plots is found in Section 3.1.2. The following Sage (Stein et al. 2014) code was used 
to compute all (x, 2 /)-pairs in the plots. 

def is Zn(W, n): ^ Checks if nonempty W is a Zn—instance . 
if n==l: 

return True 

for i in range(l, c e i 1 (len (W) / 2 )): 

if W[: i]==W[-i :] and is_Zn(W[:i], n- 1): 
return True 

return False 

def z2z3 (W): 7 )^ Counts Z2— and Z3—instance substrings. 

(M, z2 , z3) = (len (W) , 0, 0) 
for i in range(M — 2): 

for j in range ( i + 3, M+ 1): 

V = W[ i : j ] 
if is_Zn(V, 2): 

z 2 += 1 

if is_Zn(V, 3): 

z3 += 1 

return [ z2 , z3 ] 





L = 10 Change to desired word—length . 

(D2, D3) = ([1] , []) Create lists to store density values 
for n in xrange(2^L): ^ Check every binary word of length L 


word = str(bin(n ) ) [ 2 : ] 

word = ’ 0 ’x:(L — len(word)) + word 

p = z2z3 (word) 

d2 = p [0] / binomial (L + 1, 2) 
d3 = p [ 1 ]/binomial (L + 1, 2) 
i = 0 

while d2>D2[ i ] : 

i += 1 

if d2<D2 [i]: 

D2.insert(i, d2) 
D3.insert(i, set([])) 

D3 [i].add(d3) 

D2.pop( —1) ^ Remove the unnecessary 1. 






<5(^2, W) 



(5(Z2,W) 


Figure B.l (S(Z 2 , W), S^Z^, W)) for binary W of length {13,16,19,22,25,28}. 
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Appendix C 


Proofs and Computations for Chapter 5 


C.l Proofs of Monotonicity 


Lemma C.l. For fixed q>2, {|F(OI}^o ® decreasing sequence, where 


Proof. For i > 0: 


F{i) = F'^ii) = 


\m\ 


|F(z-l)| 


< 


irk=oa-q^-n 


q 


1 - 2 * 


1 - 2 ’ 


gi-2(*-i) (i_q 

1 + qi -2 


1 - gi-2' 1 + gi-2* 

(l + gi- 2 “) 

1 + g2-2*+i 
(2)-2(W-i) + 


l + (0) 

< 1 . 


□ 

Lemma C.2. For fixed i > 1 and q>2, {|G(i)|}^^ and {|-R(*)|}^i are both de¬ 
creasing sequences, where 

G(*)=G?(*) = 

r{x) = rl{x) = qx^^~^^ — x‘^^ F x^^ — qx^^~^^ F x^^■, 


(—1)V| 


n;;Yi 


- 2 J+l^ 

m=oi 


_^1-2'=+T 

1 


s{x) = s\{x) = 1 — qx^ ^Fx^] 
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//(*) 

= ^l(*) 


nUo (1 - 

u{x) 

= ujix) 

= 1 “ + «l“+‘- l“; 

v{x) 

= vlix) 

= 1 — qx^~^ + x~^ — qx^~‘^^ + x~‘^^. 

Proof. For i > 0: 

IGWI 

r{q 

s(q 

|G(^-1)| 


1 - gi-2*+' 


^l-2*(4£+2) 

_ g-2F8<?) g-2‘(10<;) _ ^l-2*{10£+2) q 


< 


< 


1 _ gl-2*(2) 

^l-2'(4£+2) q2H 

qi-2i(2i+i) _ q-2i{Ae) ' i _ ^1-2^2) 

^l-2»(3£+2) ^-l+2»(2£+l) 

gl-2*(2£+l) _ g-2«(4£) _ g2-2*(2£+3) gl-2«(4£+2) ' g-l+2»(2£+l) 

g-2‘(^+l) 

1 _ g-l-2*(2£-l) _ gl-2»(2) g2*(2<;+l) 

('2)-2l((l)+l) 

1 _ ('2)-1-21(2(1)-1) _ (2)1-2H2) + 0 

2-4 


1 - 2-3 - 2-3 


< 1 : 


(g 2 *^') v(g 


l^(^)l 


l-2*+i 


^l-2*(8£+2) _ ^-2*(10£) _|_ ^l-2‘(10^+2) _ ^-2»(12£) 
ql-2i{ie+l) _ g-2»(5£) gl-2»(5£+l) _ g-2*(6£) 

1 _ gl+2'(f-l) ^2i£ _ ^l+2*(2<;-l) ^2»(2£) 

1 _ gl-2»(2) 


< 


q 


l-2®(8€+2) 




2 *( 2 £) 


q 


-l+2*(4£+l) 


gi-2*(4<;+i) _ q-2i{u) I _ qi-2i{2) 

ql-2\'6l+2) 

ql-2i(Al+l) _ q-2'i'ibti _ g2-2*(4£+3) gl-2i(5£+2) ' g-l+2i(4£+l) 

^-2'(2£+l) 


1 - g-l-2»(€-l) _ gl-2*(2) q2\i+l) 
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('2)-2H2(1)+1) 

^ 1 - (2)-1-21((1)-1) - (2)1-2H2) + 0 

2-6 

1_ 2-1-2-3 
< 1 . 


□ 


C.2 Sage Code eor Table 5.3 oe 1(^3, g)-VALUES 

The following code to generate Table 5.3 was run with Sage 6.1.1 (Stein et ah 2014). 

Calculate G( i ) , term i of expanded g(q^(—2)). 
def r(L, q, x ) : 

X = x^L 

return q*x*X'^2 — X'^4 + X'^5 — q*x*X'^5 + X'^6 
def s(L, q, x ) : 

return 1 — q*x^(l—L) + x^(—L) 
def G(L, q, i ) : 

num = prod([s(L, q, q^(—2^(j+l))) for j in range(i)]) 
den = prod([l — q^(l—2^(k + l)) for k in range(i+l)]) 
return ( —l)^i * r(L, q, q^(—2^( i+1))) * num / den 
Galculate H( i ) , term i of expanded h(q^(—2)). 
def u(L, q, x ) : 

return q*x^(4*L+l) — x^(5*L) + q*x^(5*L+l) — x^(6*L) 
def V(L, q, x ) : 

return 1 — q*x^(l—L) + x^(—L) — q*x^(l —2*L) + x^( —2*L) 
def H(L, q, i ) : 

num = prod([v(L, q, q^(—2^(j+l))) for j in range(i)]) 
den = prod([l — q^(l—2^(k + l)) for k in range(i+l)]) 
return ( —l)^i * u(L, q, q^(—2^( i+1))) * num / den 
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^ Generate the first N terms of {a_n}. 
def a(q,N) : 

A= [0,q] 

for n in range(2, N+1): 

A. append (qx=A[ — 1] — ((n+l)%2)x=A[ floor (n /2) ]) 
return A 

Calculate the partial sum of I(Z 3, q). 
def I(q, N, M): 

A = a (q , N) 
partial = 0 

for L in range(l, N+1): 

terms = [G(L, q, n) + H(L, q, n) for n in range(M+l)] 
partial += A[L] x= sum( terms ) 
return partial 

Output bounds on I(Z 3, q) for small values of q. 
prec = 15 Level of precision . 

N = 2* prec 

for q in range(2, 7): 

print ’ q = %d : ’ %q 

print ’Lower bound with N = %d and M = 4: ’ %N, 
print round(I(q, N, 4), prec) 

print ’Upper bound with N = %d and M = 5: ’ %N, 
print round(I(q, N, 5) + 2^(—N) , prec) 



Appendix D 


Word Trees Illustrating Theorem 5.7 

From Section 5.2: “For fixed bifix-free word L length define 5^ to count the number 
of Z 2 words with bifix L that are Z 2 -bifix-free g-ary words of length m.” 

In each of the following images, word is struck through if it in not counted by hm 
but its descendants are. It is hashed through if its descendants are also eliminated. 


95 


bi=u 


bl = 2 


bl = 3 


bl = (> 


b^=25 


= 52 


bl = 100 


, n oon 

I XJXJXJXJ 



eooeo 



mmf) 


000100 • 


00100 



001000 ■ 


001100 • 


0000100 


0001100 


0010000 


0011000 


0010100 


0011100 


00000100 

00001100 

00010100 

00011100 

00100000 

00101000 

00110000 

00111000 

00100100 

00101100 

00110100 

00111100 


000 I 


'0010 



00010 



000010 • 


000110 • 


00110 



001010 ■ 


001110 • 


0000010 


0001010 


0000110 


0001110 


0010010 


0011010 


0010110 


0011110 


00000010 

00001010 

00010010 

00011010 

00000110 

00001110 

00010110 

00011110 

mmmp 

00101010 

00110010 

00111010 

00100110 

00101110 

00110110 

00111110 


000000100 
000010100 
000001100 
000011100 
000100100 
000110100 
000101100 
000111100 
001000000 
001010000 
001001000 ,1 
001011000 < 
001100000 
001110000 
001101000 
001111000 
001000100 

001010100 
001001100 
001011100 
001100100 
001110100 
001101100 
001111100 


000000010 

000010010 

000001010 

000011010 

000100010 

000110010 

000101010 

000111010 

000000110 

000010110 

000001110 

000011110 

000100110 

000110110 

000101110 

000111110 

001001010 

001011010 

001100010 

001110010 

001101010 

001111010 

001000110 

001010110 

001001110 

001011110 

001100110 

001110110 

001101110 

001111110 


Figure D.l 
q = 2, L = 


The ‘000’ half of an example word tree for Theorem 5.7 with 
‘O’, i = \L\ = 1. The tree from LLLL counted by dn is boxed. 
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fei = 2 


bl = 4 


bj = 8 


bl=13 


62=32 


^10=58 


01001 



010001 



010101 



0100001 


0101001 


0100101 


0101101 



01000001 

01001001 

01010001 

01011001 

01000101 

01001101 


01010101 

01011101 


01101 



011001 



011101 



0110001 


0111001 


0110101 


0111101 


010000001 

010010001 

010001001 

010011001 

010100001 

010110001 

010101001 

010111001 

010000101 

010010101 

010001101 

010011101 


0100000001 





010100101 

010110101 

010101101 

010111101 

011000001 

011010001 

011001001 

011011001 

011100001 

011110001 

011101001 

011111001 

011000101 

011010101 

011001101 

011011101 

011100101 

011110101 

011101101 

011111101 


00010001 


100100001 
100110001 
100001001 
100011001 

■mmmwt 

■ 0100111001 


0100000 

0101000 

0110000 

0111000 

0100100 

0101100 

0110100 

0111100 


00000 

00010 

00100 

00110 

00001 

00011 

00101 

oom 


01000 

mom 


01100 

'^'IF 


oil 

101 


0101111101 


dl 


10000001 

10010001 

10100001 

10110001 

10001001 

10011001 


10101001 

10111001 

11000001 

11010001 

11100001 

11110001 

11001001 

11011001 

11101001 

11111001 

10000101 

10010101 

10100101 

10110101 

10001101 

10011101 

mmm 

10111101 

11000101 

11010101 

11100101 

11110101 

11001101 

11011101 


11101101 

11111101 


Figure D.2 Example word tree for Theorem 5.7 with q = 2, L = ‘01’, 
£ = \L\ =2. The tree from LLLL counted by dn is boxed. 
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= 2 -= 4 


= 8 - bfg = 16 - bfj = 30 -= 63 


1000100 




100000100 


100010100 



100001100 


100011100 


1000000100 ^ 
1000010100 
1000100100 ^ 
1000110100 

1000001100 
1000011100 ^ 
1000101100 ^ 
1000111100 


10000000100 < 
10000100100 < 
10000010100 < 
10000110100 < 
10001000100 < 
10001100100 < 
10001010100 < 
10001110100 < 
10000001100 < 
10000101100 < 
10000011100 < 
10000111100 < 
10001001100 < 
10001101100 < 
10001011100 < 
10001111100 < 


100000000100 

100000100100 

100001000100 

100001100100 

100000010100 

100000110100 

100001010100 

100001110100 

100010000100 

100010100100 

100011000100 

100011100100 

100010010100 

100010110100 

100011010100 

100011110100 

100000001100 

100000101100 

100001001100 

100001101100 

100000011100 

100000111100 

100001011100 

100001111100 

100010001100 

100010101100 

100011001100 

100011101100 

100010011100 

100010111100 

100011011100 

100011111100 


1001100 




100100100 


100110100 



100101100 


100111100 


1001000100 

1001010100 

1001100100 

1001110100 

1001001100 

1001011100 

1001101100 

1001111100 


10010000100 

10010100100 

10010010100 

10010110100 

10001000100 

10011100100 

10011010100 

10011110100 

10010001100 

10010101100 

10010011100 

10010111100 

10011001100 

10011101100 

10011011100 

10011111100 


100100000100 

^1 1001001001 00 d^„ 

/ 100101000100 
^ 100101100100 
/ 100100010100 
^ 100100110100 
/ 100101010100 
^100101110100 
/ 100110000100 
^ 100110100100 
/ 100111000100 
^ 100111100100 
/ 100110010100 
^ 100110110100 
/ 100111010100 
^ 100111110100 
/ 100100001100 
^100100101100 
/ 100101001100 
^ 100101101100 
/ 100100011100 
^ 100100111100 
/ 100101011100 
\ 100101111100 
/ 100110001100 
\ 100110101100 
/ 100111001100 
\ 100111101100 
/100110011100 
\ 100110111100 
/100111011100 
^ 100111111100 


Figure D.3 Example word tree for Theorem 5.7 with g = 2, L = ‘100’, 
£ = \L\ =3. The tree from LLLL counted by dn is boxed. 


98 












fci = 3 ■ 


■bl 


bl = 24 


61 = 78 






00000 - 

- mmf^ 

- - 000100 

nnnn 


_ KJKJKJ 1.KJKJ 

-- 000200 

- 001000 Ji 


00200 — 

-- 001200 

^- 002000 

002100 



-— 002200_ 


000010 

000110 

000210 

001010 

001110 

001210 

002010 

002110 

002210 

000020 

000120 

000220 

001020 

001120 

001220 

002020 

002120 

002220 

010000 

010100 

010200 

011000 

011100 

011200 

012000 

012100 

012200 

mmff 

-010110 

010210 

011010 

011110 

011210 

012010 

012110 

012210 

010020 

010120 

010220 

011020 

011120 

011220 

012020 

012120 

012220 

020000 

020100 

020200 

021000 

021100 

021200 

022000 

022100 

022200 

020010 

020110 

020210 

021010 

021110 

021210 

022010 

022110 

022210 

mmff 

020120 

020220 

021020 

021120 

021220 

022020 

022120 

022220 


0010 


0020 


0100 


0110 


0120 


0200 


0210 


0220 



Figure D.4 Example word tree for Theorem 5.7 with g = 3, L = ‘O’, 
^ = \L\ = 1. The tree from LLLL counted by dn is boxed. 
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Appendix E 


Notation Index 

Generally, majuscule Greek letters are used for alphabets (especially F, S). Minus¬ 
cule Greek e (“var-epsilon”) represents the empty word, whereas e is used in proofs 
for arbitrarily-small positive real values; other minuscule Greek letters are used for 
monoid homomorphisms (especially 0, V')- 

Frequently, minuscule Roman letters are used for letters in words (especially a, b, 
c, d, t, u, V, w, X, y, and z), variables (especially a, b, c, d, i, j, k, i, m, n, p, q, r, 
t, u, and v), or functions (especially / and g)] majuscule Roman letters are used for 
words (especially S, T, U, V, W, X, T, and Z), variables (especially M and N), or 
functions (especially F, G, and H). Natural numbers are also used for letters. 

For notation established within a numbered dehnition in the text, the dehnition 
number is given in Table E.l below. 
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Table E.l Notation used. 


Notation 

Meaning 

Dehned 

Z 

The set of integers: , —2, —1,0,1,...}. 


Z+ 

The set of positive integers: {1, 2, 3,...}. 


N 

The set of natural numbers: {0,1,2,3,...}. 


f{n) ~ g{n) 

nm^^oo — r. 


/(n) = 0{g{n)) 

There exists c > 0 so that /(n) < cg{n). 


f{n) < g{n) 

f{n) = 0{g{n)). 


f(n) = o{g{n)) 

fS) = 0. 


S* 

The set of hnite S-words. 

1.1 


The set of length-n S-words. 

1.1 

e 

The empty word. 

1.1 

[n] 

The set {1, 2,..., n). 


w & W 

Letter w occurs in word W. 

1.3 


The word formed from n copies of the letter w. 

1.3 

|1T| 

The length of word W. 

1.3 

h{W) 

The set of letters that occur in word W. 

1.3 

IIW^II 

The number of letter recurrences in word W. 

1.3 

Wlz : j] 

The factor of W stretching from letter i + 1 to j. 

1.5 

V <w 

Word Id is a factor of word W. 

1.5 

V 

W encounters V. 

1.9 


The n-th Zimin word. 

1.15 

f(n,q) 

Least M such that every word in [q]^ encounters Z^. 

2.1 


Towering exponential a with b occurrences of a. 



The set of hh-instances in 

2.4 

UV,q) 

The proportion of words in that are Id-instances 

2.4 

E(.) 

The expected value of a given random variable. 


P(.) 

The probability of a given event. 


m.{n, q) 

The number of minimal Z„-instances in [g]*. 

2.11 

d(V, W) 

The factor density of word Id in word W. 

3.1 

5{y, W) 

The (instance) density of word Id in word hid. 

3.1 

m q) 

The liminf density of word Id over [q]. 

3.1 

K{V,q) 

The expected density of word Id in hh G [g]". 

4.1 

S{v,q) 

5n{V, q). 

4.1 

I{V,q) 

hm,,^oo KiV,q). 

4.1 

honi(V, W) 

The number of Id-encounters in W. 

4.2 

homniV,q) 

The expected number of Id-encounters in hh g [g]”. 

4.2 

SsuriV,W) 

1 if hh is a Id-instance; 0 otherwise. 

4.9 


101 





